Sept. 4, 2023, 7:48 a.m.
Team: Saiga team
| Dataset | Score | Metric |
|---|---|---|
| LiDiRus | 0.365 | Matthew`s Corr |
| RCB | 0.385 / 0.461 | F1/Acc |
| PARus | 0.82 | Accuracy |
| MuSeRC | 0.669 / 0.098 | F1a/Em |
| TERRa | 0.811 | Accuracy |
| RUSSE | 0.59 | Accuracy |
| RWSD | 0.831 | Accuracy |
| DaNetQA | 0.878 | Accuracy |
| RuCoS | 0.69 / 0.678 | F1/EM |
LLaMA-70B fine-tuned for Russian on several datasets. See https://github.com/IlyaGusev/rulm/tree/master/self_instruct Evaluation code: https://github.com/IlyaGusev/rulm/blob/master/self_instruct/src/eval_rsg.py
See https://github.com/IlyaGusev/rulm/tree/master/self_instruct
| Category | Score |
|---|---|
| LOGIC | 0.3228855331810909 |
| KNOWLEDGE | 0.3518320326381725 |
| PREDICATE-ARGUMENT STRUCTURE | 0.3516834681361791 |
| LEXICAL SEMANTICS | 0.4558592159876882 |
| Lexical Semantics - Lexical Entailment | 0.42364268926359044 |
|---|---|
| Lexical Semantics - Morphological Negation | 0.6126374746329801 |
| Lexical Semantics - Factivity | 0.34641016151377546 |
| Lexical Semantics - Symmetry/Collectivity | 0.6454972243679028 |
| Lexical Semantics - Redundancy | 0.6922186552431729 |
| Lexical Semantics - Named Entities | 0.4472135954999579 |
| Lexical Semantics - Quantifiers | 0.39074105418879573 |
| Predicate-Argument Structure Core Args | 0.5692099788303083 |
| Predicate-Argument Structure Prepositional Phrases | 0.2735506022160966 |
| Predicate-Argument Structure Ellipsis/Implicits | 0.39148014634313566 |
| Predicate-Argument Structure Anaphora/Coreference | 0.28539089649269644 |
| Predicate-Argument Structure Active/Passive | 0.45241392835886407 |
| Predicate-Argument Structure Nominalization | 0.7006490497453707 |
| Predicate-Argument Structure Genitives/Partitives | 0.6666666666666666 |
| Predicate-Argument Structure Datives | 0.5091750772173156 |
| Predicate-Argument Structure Relative Clauses | 0.13912166872805048 |
| Predicate-Argument Structure Coordination Scopes | 0.1175019408963036 |
| Predicate-Argument Structure Intersectivity | 0.2929462832020831 |
| Predicate-Argument Structure Restrictivity | 0.0 |
| Logic Negation | 0.3452479097635773 |
| Logic Double Negation | 0.4382862782019066 |
| Logic Interval/Numbers | 0.07147416898918632 |
| Logic Conjuction | 0.25819888974716115 |
| Logic Disjunction | 0.3143473067309657 |
| Logic Conditionals | 0.38331077069024155 |
| Logic Universal | 0.4264014327112209 |
| Logic Existential | 0.36689969285267143 |
| Logic Temporal | 0.3114510614341045 |
| Logic Upward Monotone | 0.22213082915965962 |
| Logic Downward Monotone | 0.14511418742827673 |
| Logic Non-Monotonic | 0.18389242812245682 |
| Knowledge Common Sense | 0.334653129013519 |
| Knowledge World Knowledge | 0.3649501415425901 |
| Dataset | Speed | RAM |
|---|---|---|
| LiDiRus | - | - |
| RCB | - | - |
| PARus | - | - |
| MuSeRC | - | - |
| TERRa | - | - |
| RUSSE | - | - |
| RWSD | - | - |
| DaNetQA | - | - |
| RuCoS | - | - |