Russian SuperGLUE

Dataset	Score	Metric
LiDiRus	0.591	Matthew`s Corr
RCB	0.597 / 0.594	F1/Acc
PARus	0.916	Accuracy
MuSeRC	0.946 / 0.837	F1a/Em
TERRa	0.927	Accuracy
RUSSE	0.739	Accuracy
RWSD	0.844	Accuracy
DaNetQA	0.933	Accuracy
RuCoS	0.82 / 0.797	F1/EM

Model description:

Tikhomirov, M.M. and Chernyshev, D.I., 2024. Improving Large Language Model Russian Adaptation with Preliminary Vocabulary Optimization. Lobachevskii Journal of Mathematics (will be soon) ruadapt Solar-10.7 tuned on train sets using https://github.com/IlyaGusev/rulm&

Parameter description:

Tested using https://github.com/IlyaGusev/rulm repo Training config: { "trainer": { "evaluation_strategy": "steps", "per_device_train_batch_size": 1, "per_device_eval_batch_size": 1, "gradient_accumulation_steps": 128, "eval_steps": 100, "save_steps": 100, "logging_steps": 5, "learning_rate": 0.00025, "num_train_epochs": 3, "lr_scheduler_type": "cosine", "warmup_steps": 30, "fp16": true, "bf16": false, "torch_compile": false, "optim": "adamw_torch" }, "lora": { "r": 16, "lora_alpha": 16, "lora_dropout": 0.05, "bias": "none", "target_modules": ["q_proj", "v_proj", "k_proj", "o_proj"], "task_type": "CAUSAL_LM" }, "load_in_8bit": false, "only_target_loss": true, "mode": "chat", "templates_path": "internal_prompts/saiga_v2.json", "model_name": "/data/models/gpt/solar/ruadapt_solar_10.7_darulm_unigram_proj_init_part2_v3_alpha_scale_2", "model_type": "causal", "max_tokens_count": 1024 }

Diagnostic (Matthew`s Correlation): 0.591

Category	Score
LOGIC	0.5182171466476416
KNOWLEDGE	0.5686318492790999
PREDICATE-ARGUMENT STRUCTURE	0.5990987736402609
LEXICAL SEMANTICS	0.6824189078791257

Lexical Semantics - Lexical Entailment	0.728252704039083
Lexical Semantics - Morphological Negation	0.5114621455194985
Lexical Semantics - Factivity	0.4050074169097869
Lexical Semantics - Symmetry/Collectivity	0.9128709291752769
Lexical Semantics - Redundancy	0.45652173913043476
Lexical Semantics - Named Entities	0.7826237921249264
Lexical Semantics - Quantifiers	0.5936523178536028
Predicate-Argument Structure Core Args	0.7242730345466608
Predicate-Argument Structure Prepositional Phrases	0.6588289607878823
Predicate-Argument Structure Ellipsis/Implicits	0.6963106238227914
Predicate-Argument Structure Anaphora/Coreference	0.4879330934705123
Predicate-Argument Structure Active/Passive	0.6562179588897107
Predicate-Argument Structure Nominalization	0.8006407690254357
Predicate-Argument Structure Genitives/Partitives	0.6875
Predicate-Argument Structure Datives	0.629940788348712
Predicate-Argument Structure Relative Clauses	0.5222329678670935
Predicate-Argument Structure Coordination Scopes	0.5248390246530005
Predicate-Argument Structure Intersectivity	0.4496144015129485
Predicate-Argument Structure Restrictivity	0.4963635881027162
Logic Negation	0.30739085537373884
Logic Double Negation	0.6492207662311682
Logic Interval/Numbers	0.433289122413121
Logic Conjuction	0.6333794997024097
Logic Disjunction	0.5277790490704242
Logic Conditionals	0.6507936507936508
Logic Universal	0.6700593942604899
Logic Existential	0.24232015747572203
Logic Temporal	0.6798418006783489
Logic Upward Monotone	0.7894736842105263
Logic Downward Monotone	0.032570825073951766
Logic Non-Monotonic	0.45044261646145084
Knowledge Common Sense	0.5274820157194837
Knowledge World Knowledge	0.6076559095877964

Performance:

Dataset	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Submission ruadapt Solar 10.7 twostage

Total score: 0.805

Model description:

Parameter description:

Diagnostic (Matthew`s Correlation): 0.591

Performance: