Russian SuperGLUE

Dataset	Score	Metric
LiDiRus	0.332	Matthew`s Corr
RCB	0.27 / 0.489	F1/Acc
PARus	0.716	Accuracy
MuSeRC	0.825 / 0.531	F1a/Em
TERRa	0.783	Accuracy
RUSSE	0.727	Accuracy
RWSD	0.669	Accuracy
DaNetQA	0.708	Accuracy
RuCoS	0.87 / 0.868	F1/EM

Model description:

The underlying model (contains 280M parameters) was pre-trained by Microsoft on the CC100 multi-lingual dataset (Russian included). We further fine-tune the pre-trained model separately for each RussianSuperGLUE task. For LiDiRus we start with TERRa-finetuned model. For RWSD we use the majority class baseline. For each task we submit best-performing checkpoint (saving each epoch) based on validation metrics.

Parameter description:

Hyper-parameters for fine-tuning mdeberta-v3-base: batch size {8, 32}, epochs {20, 30}, lr {1e-05, 2e-05}, lr scheduler {constant, linear}, warmup ratio {0.02, 0.05, 0.1}, weight decay {0, 0.01, 0.1}.

Diagnostic (Matthew`s Correlation): 0.332

Category	Score
LOGIC	0.2206319540246672
KNOWLEDGE	0.20929073547071417
PREDICATE-ARGUMENT STRUCTURE	0.27521223037544945
LEXICAL SEMANTICS	0.44224979487700394

Lexical Semantics - Lexical Entailment	0.5388317294065962
Lexical Semantics - Morphological Negation	0.0
Lexical Semantics - Factivity	0.13725490196078433
Lexical Semantics - Symmetry/Collectivity	0.4195731958391368
Lexical Semantics - Redundancy	0.4728662437434604
Lexical Semantics - Named Entities	0.22360679774997896
Lexical Semantics - Quantifiers	0.47387910220727386
Predicate-Argument Structure Core Args	0.5111111111111111
Predicate-Argument Structure Prepositional Phrases	0.33061696660711537
Predicate-Argument Structure Ellipsis/Implicits	0.4714045207910317
Predicate-Argument Structure Anaphora/Coreference	-0.07425514638437677
Predicate-Argument Structure Active/Passive	0.3059179896521843
Predicate-Argument Structure Nominalization	0.4687501237868722
Predicate-Argument Structure Genitives/Partitives	0.5773502691896257
Predicate-Argument Structure Datives	0.5238095238095238
Predicate-Argument Structure Relative Clauses	0.15555555555555556
Predicate-Argument Structure Coordination Scopes	0.20675476728754563
Predicate-Argument Structure Intersectivity	0.14159846508095775
Predicate-Argument Structure Restrictivity	0.055227791305300936
Logic Negation	0.11177050727031347
Logic Double Negation	-0.10050378152592121
Logic Interval/Numbers	-0.033731512431528755
Logic Conjuction	0.3892494720807615
Logic Disjunction	0.053838190205816545
Logic Conditionals	0.34470668367478197
Logic Universal	0.5324675324675324
Logic Existential	0.5604395604395604
Logic Temporal	0.0657951694959769
Logic Upward Monotone	0.43157894736842106
Logic Downward Monotone	0.12584555642690842
Logic Non-Monotonic	0.1286978904175574
Knowledge Common Sense	0.23847518304648244
Knowledge World Knowledge	0.15899536317006538

Performance:

Dataset	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Submission mdeberta-v3-base (Microsoft) finetune

Total score: 0.651

Model description:

Parameter description:

Diagnostic (Matthew`s Correlation): 0.332

Performance: