Oct. 11, 2023, 3:23 p.m.
Team: Saiga team
Dataset | Score | Metric |
---|---|---|
LiDiRus | 0.46 | Matthew`s Corr |
RCB | 0.529 / 0.573 | F1/Acc |
PARus | 0.824 | Accuracy |
MuSeRC | 0.927 / 0.787 | F1a/Em |
TERRa | 0.888 | Accuracy |
RUSSE | 0.758 | Accuracy |
RWSD | 0.786 | Accuracy |
DaNetQA | 0.919 | Accuracy |
RuCoS | 0.83 / 0.816 | F1/EM |
The Mistral-7B-v0.1, LoRA-tuned on RSG sets. For the information about inference see: https://github.com/IlyaGusev/rulm/blob/master/self_instruct/src/benchmarks/eval_lora_rsg.py train: https://github.com/IlyaGusev/rulm/blob/master/self_instruct/src/train.py The main config for LoRA: https://github.com/IlyaGusev/rulm/blob/master/self_instruct/configs/mistral_7b_rsg.json configs for separate tasks are in the same folder. The Mistral-7B-v01 model was trained on multi-task with the main config, merged into the main model, and then task-level LoRA adapters were trained on top of this merged model. zero-shot evaluation script: https://github.com/IlyaGusev/rulm/blob/master/self_instruct/src/benchmarks/eval_zs_rsg.py
Category | Score |
---|---|
LOGIC | 0.4272477243463581 |
KNOWLEDGE | 0.4267562897825355 |
PREDICATE-ARGUMENT STRUCTURE | 0.4644487887609425 |
LEXICAL SEMANTICS | 0.5423872872028785 |
Lexical Semantics - Lexical Entailment | 0.5684993638729131 |
---|---|
Lexical Semantics - Morphological Negation | 0.6172133998483676 |
Lexical Semantics - Factivity | 0.29814239699997197 |
Lexical Semantics - Symmetry/Collectivity | 0.6243713415848884 |
Lexical Semantics - Redundancy | 0.1444869078105018 |
Lexical Semantics - Named Entities | 0.6708203932499369 |
Lexical Semantics - Quantifiers | 0.4287214448277836 |
Predicate-Argument Structure Core Args | 0.5051219141436532 |
Predicate-Argument Structure Prepositional Phrases | 0.6207200740216239 |
Predicate-Argument Structure Ellipsis/Implicits | 0.4992872412627317 |
Predicate-Argument Structure Anaphora/Coreference | 0.3682002176496948 |
Predicate-Argument Structure Active/Passive | 0.623033246356214 |
Predicate-Argument Structure Nominalization | 0.40881490876633847 |
Predicate-Argument Structure Genitives/Partitives | 0.5773502691896257 |
Predicate-Argument Structure Datives | 0.28511240114923325 |
Predicate-Argument Structure Relative Clauses | 0.42289003161103106 |
Predicate-Argument Structure Coordination Scopes | 0.48038446141526137 |
Predicate-Argument Structure Intersectivity | 0.3629539763832752 |
Predicate-Argument Structure Restrictivity | 0.46549138385896505 |
Logic Negation | 0.631059217297185 |
Logic Double Negation | 0.420084025208403 |
Logic Interval/Numbers | 0.42051713353118003 |
Logic Conjuction | 0.6713171133426189 |
Logic Disjunction | 0.3768673314407158 |
Logic Conditionals | 0.2698412698412698 |
Logic Universal | 0.6700593942604899 |
Logic Existential | 0.3144854510165755 |
Logic Temporal | 0.33910215700436014 |
Logic Upward Monotone | 0.4146442144313646 |
Logic Downward Monotone | 0.1438234930593239 |
Logic Non-Monotonic | 0.2895702534395041 |
Knowledge Common Sense | 0.39380049095855213 |
Knowledge World Knowledge | 0.4619338817592217 |
Dataset | Speed | RAM |
---|---|---|
LiDiRus | - | - |
RCB | - | - |
PARus | - | - |
MuSeRC | - | - |
TERRa | - | - |
RUSSE | - | - |
RWSD | - | - |
DaNetQA | - | - |
RuCoS | - | - |