Russian SuperGLUE

Dataset	Score	Metric
LiDiRus	0.364	Matthew`s Corr
RCB	0.357 / 0.479	F1/Acc
PARus	0.834	Accuracy
MuSeRC	0.892 / 0.707	F1a/Em
TERRa	0.841	Accuracy
RUSSE	0.71	Accuracy
RWSD	0.669	Accuracy
DaNetQA	0.85	Accuracy
RuCoS	0.92 / 0.916	F1/EM

Model description:

Another representative of YaLM — family of autoregressive transformers pretrained on large corpus of filtered runet, Russian Wikipedia, RDT and Taiga datasets, social media dialogs and articles from Yandex.News. The model has 28 layers (3.3 billion parameters). All tasks were solved with P-tuning: instead of using hand-crafted promts or complete fine-tuning, we optimize only 40k parameters on RSG training sets.

Parameter description:

Diagnostic (Matthew`s Correlation): 0.364

Category	Score
LOGIC	0.2220002391136312
KNOWLEDGE	0.3731079163231701
PREDICATE-ARGUMENT STRUCTURE	0.3379586695984027
LEXICAL SEMANTICS	0.4914961200052112

Lexical Semantics - Lexical Entailment	0.44614865235538076
Lexical Semantics - Morphological Negation	0.5619514869490163
Lexical Semantics - Factivity	0.45098039215686275
Lexical Semantics - Symmetry/Collectivity	0.570544330734548
Lexical Semantics - Redundancy	0.27348301713730944
Lexical Semantics - Named Entities	0.35355339059327373
Lexical Semantics - Quantifiers	0.32256184896566575
Predicate-Argument Structure Core Args	0.46312262138612026
Predicate-Argument Structure Prepositional Phrases	0.6257214261204825
Predicate-Argument Structure Ellipsis/Implicits	0.3490449235764129
Predicate-Argument Structure Anaphora/Coreference	-0.004901960784313725
Predicate-Argument Structure Active/Passive	0.12565617248750865
Predicate-Argument Structure Nominalization	0.5017348819226064
Predicate-Argument Structure Genitives/Partitives	0.7637626158259734
Predicate-Argument Structure Datives	0.2058790548922549
Predicate-Argument Structure Relative Clauses	0.24370871833797697
Predicate-Argument Structure Coordination Scopes	0.366007208697342
Predicate-Argument Structure Intersectivity	0.2370100583365928
Predicate-Argument Structure Restrictivity	0.3619613829965134
Logic Negation	0.27900096804546626
Logic Double Negation	0.30151134457776363
Logic Interval/Numbers	-0.27124713338229184
Logic Conjuction	0.6
Logic Disjunction	0.27608872259967787
Logic Conditionals	0.07100716024967263
Logic Universal	0.6446583712203042
Logic Existential	0.17910620335162064
Logic Temporal	-0.15749883157896472
Logic Upward Monotone	0.7190325425674107
Logic Downward Monotone	-0.22171945701357465
Logic Non-Monotonic	0.05572782125753529
Knowledge Common Sense	0.36452669663330906
Knowledge World Knowledge	0.36188997993038646

Performance:

Dataset	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Submission YaLM p-tune (3.3B frozen + 40k trainable params)

Total score: 0.711

Model description:

Parameter description:

Diagnostic (Matthew`s Correlation): 0.364

Performance: