| Dataset | Score | Metric | 
|---|---|---|
| LiDiRus | 0.274 | Matthew`s Corr | 
| RCB | 0.361 / 0.493 | F1/Acc | 
| PARus | 0.554 | Accuracy | 
| MuSeRC | 0.656 / 0.112 | F1a/Em | 
| TERRa | 0.655 | Accuracy | 
| RUSSE | 0.57 | Accuracy | 
| RWSD | 0.623 | Accuracy | 
| DaNetQA | 0.661 | Accuracy | 
| RuCoS | 0.4 / 0.395 | F1/EM | 
Qwen 1.5 4B by Alibaba, fully fine-tuned on Ilya Gusev's Saiga dataset. Training as well as evaluation was performed on Google v4-32 TPU by TRC, see https://github.com/defdet/rulm/blob/master/self_instruct/src/benchmarks/eval_rsg_tpu.py for evaluation.
| Category | Score | 
|---|---|
| LOGIC | 0.1982492693072155 | 
| KNOWLEDGE | 0.26419115612435906 | 
| PREDICATE-ARGUMENT STRUCTURE | 0.24601845359596022 | 
| LEXICAL SEMANTICS | 0.3219335477393724 | 
| Lexical Semantics - Lexical Entailment | 0.36425098156923136 | 
|---|---|
| Lexical Semantics - Morphological Negation | 0.4147575310031266 | 
| Lexical Semantics - Factivity | 0.34641016151377546 | 
| Lexical Semantics - Symmetry/Collectivity | -0.12171612389003691 | 
| Lexical Semantics - Redundancy | -0.08695652173913043 | 
| Lexical Semantics - Named Entities | 0.2891574659831201 | 
| Lexical Semantics - Quantifiers | 0.19045970117256472 | 
| Predicate-Argument Structure Core Args | 0.2297707831614211 | 
| Predicate-Argument Structure Prepositional Phrases | 0.34665482822302573 | 
| Predicate-Argument Structure Ellipsis/Implicits | 0.10622957319984967 | 
| Predicate-Argument Structure Anaphora/Coreference | 0.1667000100033345 | 
| Predicate-Argument Structure Active/Passive | 0.27640672769878033 | 
| Predicate-Argument Structure Nominalization | 0.35355339059327373 | 
| Predicate-Argument Structure Genitives/Partitives | 0.4900980294098034 | 
| Predicate-Argument Structure Datives | 0.5091750772173156 | 
| Predicate-Argument Structure Relative Clauses | 0.34097520463683817 | 
| Predicate-Argument Structure Coordination Scopes | 0.16834512458535864 | 
| Predicate-Argument Structure Intersectivity | 0.302433025214821 | 
| Predicate-Argument Structure Restrictivity | -0.009732619374563063 | 
| Logic Negation | 0.3208739674079263 | 
| Logic Double Negation | 0.3481553119113957 | 
| Logic Interval/Numbers | -0.16884185384856412 | 
| Logic Conjuction | 0.12909944487358058 | 
| Logic Disjunction | 0.41702882811414954 | 
| Logic Conditionals | 0.21821789023599236 | 
| Logic Universal | 0.15228622596829317 | 
| Logic Existential | 0.5384615384615384 | 
| Logic Temporal | -0.09449929894737882 | 
| Logic Upward Monotone | 0.41130637283031135 | 
| Logic Downward Monotone | -0.1504142093990467 | 
| Logic Non-Monotonic | 0.017197979012252743 | 
| Knowledge Common Sense | 0.19707513098159088 | 
| Knowledge World Knowledge | 0.33004157719410765 | 
| Dataset | Speed | RAM | 
|---|---|---|
| LiDiRus | - | - | 
| RCB | - | - | 
| PARus | - | - | 
| MuSeRC | - | - | 
| TERRa | - | - | 
| RUSSE | - | - | 
| RWSD | - | - | 
| DaNetQA | - | - | 
| RuCoS | - | - |