Russian SuperGLUE

Датасет	Результат	Метрика
LiDiRus	0,417	Кор, коэффициент Мэтью
RCB	0,545 / 0,555	F1/Точность
PARus	0,756	Точность
MuSeRC	0,894 / 0,695	F1a/Em
TERRa	0,876	Точность
RUSSE	0,668	Точность
RWSD	0,708	Точность
DaNetQA	0,878	Точность
RuCoS	0,76 / 0,733	F1/EM

Описание модели:

Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //arXiv preprint arXiv:2312.02598. – 2023.

Описание параметров:

Tested using https://github.com/IlyaGusev/rulm repo { "trainer": { "evaluation_strategy": "steps", "per_device_train_batch_size": 4, "per_device_eval_batch_size": 4, "gradient_accumulation_steps": 32, "eval_steps": 50, "save_steps": 50, "logging_steps": 5, "learning_rate": 0.00025, "num_train_epochs": 3, "lr_scheduler_type": "cosine", "warmup_steps": 30, "fp16": true, "bf16": false, "torch_compile": false, "optim": "adamw_torch" }, "lora": { "r": 16, "lora_alpha": 16, "lora_dropout": 0.05, "bias": "none", "target_modules": ["q_proj", "v_proj", "k_proj", "o_proj"], "task_type": "CAUSAL_LM" }, "load_in_8bit": false, "only_target_loss": true, "mode": "chat", "templates_path": "internal_prompts/saiga_v2.json", "model_name": "llama2_7b_darulm_unigram_tie_2e_16_11_23", "model_type": "causal", "max_tokens_count": 1024 }

Диагностика: 0,417

Категория	Результат
LOGIC	0,3472109173849461
KNOWLEDGE	0,3988089571509749
PREDICATE-ARGUMENT STRUCTURE	0,4107842841297232
LEXICAL SEMANTICS	0,4999636017616265

Lexical Semantics - Lexical Entailment	0,5648145000600431
Lexical Semantics - Morphological Negation	0,39477101697586137
Lexical Semantics - Factivity	0,4264014327112209
Lexical Semantics - Symmetry/Collectivity	0,31622776601683794
Lexical Semantics - Redundancy	0,17951965741648493
Lexical Semantics - Named Entities	0,612056372482123
Lexical Semantics - Quantifiers	0,3638714534805853
Predicate-Argument Structure Core Args	0,30962962962962964
Predicate-Argument Structure Prepositional Phrases	0,3943052207977212
Predicate-Argument Structure Ellipsis/Implicits	0,6205427141408237
Predicate-Argument Structure Anaphora/Coreference	0,4144368375833978
Predicate-Argument Structure Active/Passive	0,4083133966424866
Predicate-Argument Structure Nominalization	0,5444357229372963
Predicate-Argument Structure Genitives/Partitives	0,6813851438692469
Predicate-Argument Structure Datives	0,3563483225498992
Predicate-Argument Structure Relative Clauses	0,6407232755171874
Predicate-Argument Structure Coordination Scopes	0,14169568340005298
Predicate-Argument Structure Intersectivity	0,42208132696637884
Predicate-Argument Structure Restrictivity	0,36900620230837305
Logic Negation	0,473553991329486
Logic Double Negation	0,3892494720807615
Logic Interval/Numbers	0,08779776400125335
Logic Conjuction	0,33734954246999327
Logic Disjunction	0,4365575409204501
Logic Conditionals	0,3730235484764954
Logic Universal	0,3959441875175622
Logic Existential	0,17910620335162064
Logic Temporal	0,11891767800211263
Logic Upward Monotone	0,7009124021507408
Logic Downward Monotone	0,16146816171752817
Logic Non-Monotonic	0,39405520311955033
Knowledge Common Sense	0,3600431599767098
Knowledge World Knowledge	0,4447466812418805

Производительность:

Датасет	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Новый сабмит ruadapt LLaMA-2 7B LoRA

Результат бейзлайна: 0,71

Описание модели:

Описание параметров:

Диагностика: 0,417

Производительность: