Submission Qwen 4B saiga zero-shot

April 21, 2024, 1 a.m.

Team: Maxim Bolgov

Model url: https://huggingface.co/Defetya/qwen-4B-saiga


Total score: 0.505

Dataset Score Metric
LiDiRus 0.274 Matthew`s Corr
RCB 0.361 / 0.493 F1/Acc
PARus 0.554 Accuracy
MuSeRC 0.656 / 0.112 F1a/Em
TERRa 0.655 Accuracy
RUSSE 0.57 Accuracy
RWSD 0.623 Accuracy
DaNetQA 0.661 Accuracy
RuCoS 0.4 / 0.395 F1/EM
Model description:

Qwen 1.5 4B by Alibaba, fully fine-tuned on Ilya Gusev's Saiga dataset. Training as well as evaluation was performed on Google v4-32 TPU by TRC, see https://github.com/defdet/rulm/blob/master/self_instruct/src/benchmarks/eval_rsg_tpu.py for evaluation.


Parameter description:

Diagnostic (Matthew`s Correlation): 0.274

Category Score
LOGIC 0.1982492693072155
KNOWLEDGE 0.26419115612435906
PREDICATE-ARGUMENT STRUCTURE 0.24601845359596022
LEXICAL SEMANTICS 0.3219335477393724
Lexical Semantics - Lexical Entailment 0.36425098156923136
Lexical Semantics - Morphological Negation 0.4147575310031266
Lexical Semantics - Factivity 0.34641016151377546
Lexical Semantics - Symmetry/Collectivity -0.12171612389003691
Lexical Semantics - Redundancy -0.08695652173913043
Lexical Semantics - Named Entities 0.2891574659831201
Lexical Semantics - Quantifiers 0.19045970117256472
Predicate-Argument Structure Core Args 0.2297707831614211
Predicate-Argument Structure Prepositional Phrases 0.34665482822302573
Predicate-Argument Structure Ellipsis/Implicits 0.10622957319984967
Predicate-Argument Structure Anaphora/Coreference 0.1667000100033345
Predicate-Argument Structure Active/Passive 0.27640672769878033
Predicate-Argument Structure Nominalization 0.35355339059327373
Predicate-Argument Structure Genitives/Partitives 0.4900980294098034
Predicate-Argument Structure Datives 0.5091750772173156
Predicate-Argument Structure Relative Clauses 0.34097520463683817
Predicate-Argument Structure Coordination Scopes 0.16834512458535864
Predicate-Argument Structure Intersectivity 0.302433025214821
Predicate-Argument Structure Restrictivity -0.009732619374563063
Logic Negation 0.3208739674079263
Logic Double Negation 0.3481553119113957
Logic Interval/Numbers -0.16884185384856412
Logic Conjuction 0.12909944487358058
Logic Disjunction 0.41702882811414954
Logic Conditionals 0.21821789023599236
Logic Universal 0.15228622596829317
Logic Existential 0.5384615384615384
Logic Temporal -0.09449929894737882
Logic Upward Monotone 0.41130637283031135
Logic Downward Monotone -0.1504142093990467
Logic Non-Monotonic 0.017197979012252743
Knowledge Common Sense 0.19707513098159088
Knowledge World Knowledge 0.33004157719410765

Performance:

Dataset Speed RAM
LiDiRus - -
RCB - -
PARus - -
MuSeRC - -
TERRa - -
RUSSE - -
RWSD - -
DaNetQA - -
RuCoS - -