Russian SuperGLUE

Dataset	Score	Metric
LiDiRus	0.497	Matthew`s Corr
RCB	0.497 / 0.541	F1/Acc
PARus	0.842	Accuracy
MuSeRC	0.916 / 0.773	F1a/Em
TERRa	0.871	Accuracy
RUSSE	0.823	Accuracy
RWSD	0.669	Accuracy
DaNetQA	0.889	Accuracy
RuCoS	0.9 / 0.902	F1/EM

Model description:

FRED-T5 1.7B (Full-scale Russian Enhanced Denoisers T5) Architecture based on T5. It has 24 layers and hidden size 1536. The model trained on a mixture of 7 denoisers like UL2 with several differences. It was trained on a Russian language corpus (300GB). The dataset is the same as for ruT5 models. Bbpe tokenizer. First half of the time model was trained on the small part of all datasets (1%,3GB) and without prefixes in each task. For RSG, we trained the model as described in the T5 paper. First, we trained to multitask for all rsg tasks. Then we took the best checkpoint for the task and trained it further. Model card – https://huggingface.co/sberbank-ai/FRED-T5-1.7B

Parameter description:

Diagnostic (Matthew`s Correlation): 0.497

Category	Score
LOGIC	0.35169869588932245
KNOWLEDGE	0.41950270175059584
PREDICATE-ARGUMENT STRUCTURE	0.5373921746168259
LEXICAL SEMANTICS	0.5574648062951351

Lexical Semantics - Lexical Entailment	0.5179777310265292
Lexical Semantics - Morphological Negation	0.7319250547113999
Lexical Semantics - Factivity	0.37619206243122316
Lexical Semantics - Symmetry/Collectivity	0.6454972243679028
Lexical Semantics - Redundancy	0.45652173913043476
Lexical Semantics - Named Entities	0.5555555555555556
Lexical Semantics - Quantifiers	0.500422753610212
Predicate-Argument Structure Core Args	0.6928203230275509
Predicate-Argument Structure Prepositional Phrases	0.655564782695531
Predicate-Argument Structure Ellipsis/Implicits	0.47925723781702034
Predicate-Argument Structure Anaphora/Coreference	0.44034755759456745
Predicate-Argument Structure Active/Passive	0.35321924163127
Predicate-Argument Structure Nominalization	0.8444444444444444
Predicate-Argument Structure Genitives/Partitives	0.7637626158259734
Predicate-Argument Structure Datives	0.629940788348712
Predicate-Argument Structure Relative Clauses	0.4666666666666667
Predicate-Argument Structure Coordination Scopes	0.48038446141526137
Predicate-Argument Structure Intersectivity	0.36786594593516775
Predicate-Argument Structure Restrictivity	0.4963635881027162
Logic Negation	0.05704384514499037
Logic Double Negation	0.38604948085158797
Logic Interval/Numbers	0.15418961562809935
Logic Conjuction	0.38666666666666666
Logic Disjunction	0.2926976883388273
Logic Conditionals	0.1972421118046462
Logic Universal	0.7774288420142416
Logic Existential	0.38981938376529196
Logic Temporal	0.47306844125299624
Logic Upward Monotone	0.837707816583391
Logic Downward Monotone	-0.311749325707824
Logic Non-Monotonic	0.30434782608695654
Knowledge Common Sense	0.43009118541033436
Knowledge World Knowledge	0.39182408938141844

Performance:

Dataset	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Submission FRED-T5 1.7B finetune

Total score: 0.762

Model description:

Parameter description:

Diagnostic (Matthew`s Correlation): 0.497

Performance: