Russian SuperGLUE

Dataset	Score	Metric
LiDiRus	0.389	Matthew`s Corr
RCB	0.456 / 0.546	F1/Acc
PARus	0.776	Accuracy
MuSeRC	0.887 / 0.678	F1a/Em
TERRa	0.801	Accuracy
RUSSE	0.775	Accuracy
RWSD	0.669	Accuracy
DaNetQA	0.799	Accuracy
RuCoS	0.87 / 0.863	F1/EM

Model description:

FRED-T5 large (800M) (Full-scale Russian Enhanced Denoisers T5) Architecture based on T5. It has 24 layers and hidden size 1024. The model trained on a mixture of 7 denoisers like UL2 with several differences. It was trained on a Russian language corpus (300GB). The dataset is the same as for ruT5 models. Bbpe tokenizer. First half of the time model was trained on the small part of all datasets (1%,3GB) and without prefixes in each task. For RSG, we trained the model as described in the T5 paper. First, we trained to multitask for all rsg tasks. Then we took the best checkpoint for the task and trained it further.

Parameter description:

Diagnostic (Matthew`s Correlation): 0.389

Category	Score
LOGIC	0.22130694844664164
KNOWLEDGE	0.41862776937029456
PREDICATE-ARGUMENT STRUCTURE	0.35695995629214483
LEXICAL SEMANTICS	0.48869263624764997

Lexical Semantics - Lexical Entailment	0.40433843702131134
Lexical Semantics - Morphological Negation	0.4500514373894347
Lexical Semantics - Factivity	0.3843711067980367
Lexical Semantics - Symmetry/Collectivity	0.5477225575051661
Lexical Semantics - Redundancy	0.6756639246921762
Lexical Semantics - Named Entities	0.50709255283711
Lexical Semantics - Quantifiers	0.5607478659394147
Predicate-Argument Structure Core Args	0.6
Predicate-Argument Structure Prepositional Phrases	0.44921394657019476
Predicate-Argument Structure Ellipsis/Implicits	0.1649915822768611
Predicate-Argument Structure Anaphora/Coreference	0.31491968278395555
Predicate-Argument Structure Active/Passive	0.31353483628976775
Predicate-Argument Structure Nominalization	0.6024640760767093
Predicate-Argument Structure Genitives/Partitives	0.49099025303098287
Predicate-Argument Structure Datives	0.629940788348712
Predicate-Argument Structure Relative Clauses	0.2537340189666186
Predicate-Argument Structure Coordination Scopes	0.14434609461063858
Predicate-Argument Structure Intersectivity	0.3107299650387684
Predicate-Argument Structure Restrictivity	-0.009732619374563063
Logic Negation	0.08956930089044976
Logic Double Negation	0.420084025208403
Logic Interval/Numbers	0.03262334552859464
Logic Conjuction	0.39440531887330776
Logic Disjunction	-0.011350087076783314
Logic Conditionals	0.1259881576697424
Logic Universal	0.3959441875175622
Logic Existential	0.5384615384615384
Logic Temporal	0.24675324675324675
Logic Upward Monotone	0.7894736842105263
Logic Downward Monotone	-0.22171945701357465
Logic Non-Monotonic	0.15445574836957118
Knowledge Common Sense	0.41076108715328863
Knowledge World Knowledge	0.41071131554955886

Performance:

Dataset	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Submission FRED-T5 large finetune

Total score: 0.706

Model description:

Parameter description:

Diagnostic (Matthew`s Correlation): 0.389

Performance: