Russian SuperGLUE

Dataset	Score	Metric
LiDiRus	0.274	Matthew`s Corr
RCB	0.361 / 0.493	F1/Acc
PARus	0.554	Accuracy
MuSeRC	0.656 / 0.112	F1a/Em
TERRa	0.655	Accuracy
RUSSE	0.57	Accuracy
RWSD	0.623	Accuracy
DaNetQA	0.661	Accuracy
RuCoS	0.4 / 0.395	F1/EM

Model description:

Qwen 1.5 4B by Alibaba, fully fine-tuned on Ilya Gusev's Saiga dataset. Training as well as evaluation was performed on Google v4-32 TPU by TRC, see https://github.com/defdet/rulm/blob/master/self_instruct/src/benchmarks/eval_rsg_tpu.py for evaluation.

Parameter description:

Diagnostic (Matthew`s Correlation): 0.274

Category	Score
LOGIC	0.1982492693072155
KNOWLEDGE	0.26419115612435906
PREDICATE-ARGUMENT STRUCTURE	0.24601845359596022
LEXICAL SEMANTICS	0.3219335477393724

Lexical Semantics - Lexical Entailment	0.36425098156923136
Lexical Semantics - Morphological Negation	0.4147575310031266
Lexical Semantics - Factivity	0.34641016151377546
Lexical Semantics - Symmetry/Collectivity	-0.12171612389003691
Lexical Semantics - Redundancy	-0.08695652173913043
Lexical Semantics - Named Entities	0.2891574659831201
Lexical Semantics - Quantifiers	0.19045970117256472
Predicate-Argument Structure Core Args	0.2297707831614211
Predicate-Argument Structure Prepositional Phrases	0.34665482822302573
Predicate-Argument Structure Ellipsis/Implicits	0.10622957319984967
Predicate-Argument Structure Anaphora/Coreference	0.1667000100033345
Predicate-Argument Structure Active/Passive	0.27640672769878033
Predicate-Argument Structure Nominalization	0.35355339059327373
Predicate-Argument Structure Genitives/Partitives	0.4900980294098034
Predicate-Argument Structure Datives	0.5091750772173156
Predicate-Argument Structure Relative Clauses	0.34097520463683817
Predicate-Argument Structure Coordination Scopes	0.16834512458535864
Predicate-Argument Structure Intersectivity	0.302433025214821
Predicate-Argument Structure Restrictivity	-0.009732619374563063
Logic Negation	0.3208739674079263
Logic Double Negation	0.3481553119113957
Logic Interval/Numbers	-0.16884185384856412
Logic Conjuction	0.12909944487358058
Logic Disjunction	0.41702882811414954
Logic Conditionals	0.21821789023599236
Logic Universal	0.15228622596829317
Logic Existential	0.5384615384615384
Logic Temporal	-0.09449929894737882
Logic Upward Monotone	0.41130637283031135
Logic Downward Monotone	-0.1504142093990467
Logic Non-Monotonic	0.017197979012252743
Knowledge Common Sense	0.19707513098159088
Knowledge World Knowledge	0.33004157719410765

Performance:

Dataset	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Submission Qwen 4B saiga zero-shot

Total score: 0.505

Model description:

Parameter description:

Diagnostic (Matthew`s Correlation): 0.274

Performance: