Russian SuperGLUE

Dataset	Score	Metric
LiDiRus	0.124	Matthew`s Corr
RCB	0.408 / 0.447	F1/Acc
PARus	0.766	Accuracy
MuSeRC	0.673 / 0.364	F1a/Em
TERRa	0.605	Accuracy
RUSSE	0.587	Accuracy
RWSD	0.669	Accuracy
DaNetQA	0.637	Accuracy
RuCoS	0.86 / 0.859	F1/EM

Model description:

Autoregressive transformer language model with 18 layers (1.04 billion parameters) trained on dataset composed of filtered runet, russian Wikipedia, RDT and Taiga datasets, social media dialogs and articles from Yandex.News. Model was not trained nor finetuned on any of RSG tasks, evaluation is performed in few-shot manner, answers are ranked by log-score of the model conditioned on handcrafted task descriptions.

Parameter description:

Diagnostic (Matthew`s Correlation): 0.124

Category	Score
LOGIC	0.09858060032374041
KNOWLEDGE	0.15038327050458106
PREDICATE-ARGUMENT STRUCTURE	0.13440660949444153
LEXICAL SEMANTICS	0.13102802801087352

Lexical Semantics - Lexical Entailment	0.06934162091692797
Lexical Semantics - Morphological Negation	0.32433748657040123
Lexical Semantics - Factivity	0.223606797749979
Lexical Semantics - Symmetry/Collectivity	0.41079191812887456
Lexical Semantics - Redundancy	-0.1444869078105018
Lexical Semantics - Named Entities	0.12403473458920847
Lexical Semantics - Quantifiers	-0.0036806768356424067
Predicate-Argument Structure Core Args	0.2786692764848067
Predicate-Argument Structure Prepositional Phrases	0.08330381216573046
Predicate-Argument Structure Ellipsis/Implicits	0.05555555555555555
Predicate-Argument Structure Anaphora/Coreference	0.1170035878603409
Predicate-Argument Structure Active/Passive	0.27604950431925757
Predicate-Argument Structure Nominalization	0.29433147307547786
Predicate-Argument Structure Genitives/Partitives	0.0
Predicate-Argument Structure Datives	0.047619047619047616
Predicate-Argument Structure Relative Clauses	0.034815531191139566
Predicate-Argument Structure Coordination Scopes	0.10482848367219183
Predicate-Argument Structure Intersectivity	0.2724196464492864
Predicate-Argument Structure Restrictivity	-0.28741691319281637
Logic Negation	0.024218258071629018
Logic Double Negation	0.4271410714851651
Logic Interval/Numbers	0.2680281337094487
Logic Conjuction	0.2009535214960729
Logic Disjunction	-0.12841543508778389
Logic Conditionals	-0.047619047619047616
Logic Universal	0.01413506985480439
Logic Existential	0.2058790548922549
Logic Temporal	-0.01698823971458752
Logic Upward Monotone	-0.05923488777590923
Logic Downward Monotone	-0.06726727939963124
Logic Non-Monotonic	-0.017197979012252743
Knowledge Common Sense	0.12244223167879241
Knowledge World Knowledge	0.1536233973434768

Performance:

Dataset	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Submission YaLM 1.0B few-shot

Total score: 0.577

Model description:

Parameter description:

Diagnostic (Matthew`s Correlation): 0.124

Performance: