Russian SuperGLUE

Датасет	Результат	Метрика
LiDiRus	0,421	Кор, коэффициент Мэтью
RCB	0,311 / 0,441	F1/Точность
PARus	0,806	Точность
MuSeRC	0,882 / 0,666	F1a/Em
TERRa	0,831	Точность
RUSSE	0,723	Точность
RWSD	0,669	Точность
DaNetQA	0,735	Точность
RuCoS	0,91 / 0,911	F1/EM

Описание модели:

Here we evaluate encoder-only part of the pretrained FRED-T5-1.7B model (https://russiansuperglue.com/login/submit_info/1936), resulting 760M parameters (2.2 times smaller). We fine-tune the model separately for each RussianSuperGLUE task. For LiDiRus we start with TERRa-finetuned model. For RWSD we use the majority class baseline. For each task we submit best-performing checkpoint (saving each epoch, but more frequently for RCB, PARus and RuCoS) based on validation metrics. No fixes were applied to the datasets. No filters/fixes were applied to datasets.

Описание параметров:

Hyper-parameters for fine-tuning: batch size of 16, epochs {10, 20, 30}, lr {1e-06, 1e-04, 1e-5, 2e-5, 3e-5}, linear lr scheduler, warmup ratio {0.02, 0.05}, weight decay {0, 0.01, 0.1}.

Диагностика: 0,421

Категория	Результат
LOGIC	0,25548133714772964
KNOWLEDGE	0,35721247554168717
PREDICATE-ARGUMENT STRUCTURE	0,4530930715114425
LEXICAL SEMANTICS	0,49319873437048684

Lexical Semantics - Lexical Entailment	0,4770239916187289
Lexical Semantics - Morphological Negation	0,39477101697586137
Lexical Semantics - Factivity	0,4226770155886447
Lexical Semantics - Symmetry/Collectivity	0,3243723035407737
Lexical Semantics - Redundancy	0,27348301713730944
Lexical Semantics - Named Entities	0,612056372482123
Lexical Semantics - Quantifiers	0,3157894736842105
Predicate-Argument Structure Core Args	0,5487601413337525
Predicate-Argument Structure Prepositional Phrases	0,6588289607878823
Predicate-Argument Structure Ellipsis/Implicits	0,5260558322946913
Predicate-Argument Structure Anaphora/Coreference	0,3349672436203912
Predicate-Argument Structure Active/Passive	0,2843611155188746
Predicate-Argument Structure Nominalization	0,5017348819226064
Predicate-Argument Structure Genitives/Partitives	0,15724272550828775
Predicate-Argument Structure Datives	0,7637626158259734
Predicate-Argument Structure Relative Clauses	0,3333333333333333
Predicate-Argument Structure Coordination Scopes	0,5091750772173156
Predicate-Argument Structure Intersectivity	0,3973597071195131
Predicate-Argument Structure Restrictivity	0,28741691319281637
Logic Negation	0,12866255886641637
Logic Double Negation	0,21320071635561044
Logic Interval/Numbers	0,010615495921641366
Logic Conjuction	0,5680375574437545
Logic Disjunction	0,16447838793172298
Logic Conditionals	0,07100716024967263
Logic Universal	0,2548235957188128
Logic Existential	0,2058790548922549
Logic Temporal	0,33910215700436014
Logic Upward Monotone	0,821271097469555
Logic Downward Monotone	-0,30207927000959933
Logic Non-Monotonic	0,2364331218717302
Knowledge Common Sense	0,3594723992410968
Knowledge World Knowledge	0,3394352270463883

Производительность:

Датасет	Speed	RAM
LiDiRus	-	-
RCB	-	-
PARus	-	-
MuSeRC	-	-
TERRa	-	-
RUSSE	-	-
RWSD	-	-
DaNetQA	-	-
RuCoS	-	-

Новый сабмит FRED-T5 1.7B (only encoder 760M) finetune

Результат бейзлайна: 0,694

Описание модели:

Описание параметров:

Диагностика: 0,421

Производительность: