Russian SuperGLUE

Dataset	Score	Metric
LiDiRus	0.369	Matthew`s Corr
RCB	0.328 / 0.457	F1/Acc
PARus	0.59	Accuracy
MuSeRC	0.809 / 0.501	F1a/Em
TERRa	0.798	Accuracy
RUSSE	0.765	Accuracy
RWSD	0.669	Accuracy
DaNetQA	0.757	Accuracy
RuCoS	0.89 / 0.886	F1/EM

Model description:

The underlying model (contains 560M parameters) was pre-trained by Facebook on the CC100 multi-lingual dataset (Russian included). We further fine-tune the pre-trained model separately for each RussianSuperGLUE task. For LiDiRus we start with TERRa-finetuned model. For RWSD we use the majority class baseline. For each task we submit best-performing checkpoint (saving each epoch) based on validation metrics.

Parameter description:

Hyper-parameters for fine-tuning xlm-roberta-large: batch size {8, 16, 32, 64}, epochs {20, 30}, lr {1e-06, 2e-06, 1e-05, 3e-05}, lr scheduler {constant, linear}, warmup ratio {0.02, 0.05, 0.1}, weight decay {0, 0.01, 0.1}.

Diagnostic (Matthew`s Correlation): 0.369

Category	Score
LOGIC	0.1963386058559272
KNOWLEDGE	0.3156834130078425
PREDICATE-ARGUMENT STRUCTURE	0.3777715485318068
LEXICAL SEMANTICS	0.45839855677638014

Lexical Semantics - Lexical Entailment	0.4282616143136151
Lexical Semantics - Morphological Negation	0.45175395145262565
Lexical Semantics - Factivity	0.18428853505018536
Lexical Semantics - Symmetry/Collectivity	0.43852900965351466
Lexical Semantics - Redundancy	0.27348301713730944
Lexical Semantics - Named Entities	0.38949041885226005
Lexical Semantics - Quantifiers	0.501227406091964
Predicate-Argument Structure Core Args	0.40943028340181464
Predicate-Argument Structure Prepositional Phrases	0.5823384109195534
Predicate-Argument Structure Ellipsis/Implicits	0.3263956049169334
Predicate-Argument Structure Anaphora/Coreference	0.26875600982680947
Predicate-Argument Structure Active/Passive	0.1835325870964494
Predicate-Argument Structure Nominalization	0.4687501237868722
Predicate-Argument Structure Genitives/Partitives	0.8660254037844386
Predicate-Argument Structure Datives	0.629940788348712
Predicate-Argument Structure Relative Clauses	0.16265001215808886
Predicate-Argument Structure Coordination Scopes	0.33454829277463405
Predicate-Argument Structure Intersectivity	0.2553606237816764
Predicate-Argument Structure Restrictivity	0.2748737083745107
Logic Negation	0.045531262041544854
Logic Double Negation	0.30151134457776363
Logic Interval/Numbers	-0.1050485078938172
Logic Conjuction	0.42163702135578396
Logic Disjunction	0.008695652173913044
Logic Conditionals	0.14285714285714285
Logic Universal	0.4029114820126901
Logic Existential	0.04279604925109129
Logic Temporal	0.2548235957188128
Logic Upward Monotone	0.623033246356214
Logic Downward Monotone	-0.06726727939963124
Logic Non-Monotonic	0.09267505241022214
Knowledge Common Sense	0.2876139817629179
Knowledge World Knowledge	0.3387649364472491

Performance:

Dataset	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Submission xlm-roberta-large (Facebook) finetune

Total score: 0.654

Model description:

Parameter description:

Diagnostic (Matthew`s Correlation): 0.369

Performance: