Dataset | Score | Metric |
---|---|---|
LiDiRus | 0.422 | Matthew`s Corr |
RCB | 0.484 / 0.505 | F1/Acc |
PARus | 0.888 | Accuracy |
MuSeRC | 0.817 / 0.532 | F1a/Em |
TERRa | 0.795 | Accuracy |
RUSSE | 0.596 | Accuracy |
RWSD | 0.714 | Accuracy |
DaNetQA | 0.878 | Accuracy |
RuCoS | 0.68 / 0.667 | F1/EM |
gpt-3.5-turbo as is with prompts from here: https://github.com/IlyaGusev/rulm/blob/master/self_instruct/src/eval_rsg.py
temperature = 0.0 top_p = 1.0
Category | Score |
---|---|
LOGIC | 0.27724846114423907 |
KNOWLEDGE | 0.48593972117950296 |
PREDICATE-ARGUMENT STRUCTURE | 0.4222012507483124 |
LEXICAL SEMANTICS | 0.4507478531035058 |
Lexical Semantics - Lexical Entailment | 0.4587428680014169 |
---|---|
Lexical Semantics - Morphological Negation | 0.2140767569329586 |
Lexical Semantics - Factivity | 0.34020690871988585 |
Lexical Semantics - Symmetry/Collectivity | 0.6243713415848884 |
Lexical Semantics - Redundancy | 0.2211629342323457 |
Lexical Semantics - Named Entities | 0.6666666666666666 |
Lexical Semantics - Quantifiers | 0.37854910518078255 |
Predicate-Argument Structure Core Args | 0.5760964547890037 |
Predicate-Argument Structure Prepositional Phrases | 0.48034143356248 |
Predicate-Argument Structure Ellipsis/Implicits | 0.5493502655735357 |
Predicate-Argument Structure Anaphora/Coreference | 0.44034755759456745 |
Predicate-Argument Structure Active/Passive | 0.5771944181220839 |
Predicate-Argument Structure Nominalization | 0.40881490876633847 |
Predicate-Argument Structure Genitives/Partitives | 0.5773502691896257 |
Predicate-Argument Structure Datives | 0.629940788348712 |
Predicate-Argument Structure Relative Clauses | 0.33954987505086615 |
Predicate-Argument Structure Coordination Scopes | 0.13725270326150324 |
Predicate-Argument Structure Intersectivity | 0.33452515977294983 |
Predicate-Argument Structure Restrictivity | 0.040422604172722164 |
Logic Negation | 0.3221028323526659 |
Logic Double Negation | 0.2763853991962833 |
Logic Interval/Numbers | 0.12017278061240777 |
Logic Conjuction | 0.24809590313546123 |
Logic Disjunction | 0.053838190205816545 |
Logic Conditionals | -0.07100716024967263 |
Logic Universal | 0.5324675324675324 |
Logic Existential | 0.3851644432598216 |
Logic Temporal | 0.6386392673039035 |
Logic Upward Monotone | 0.3612343522752406 |
Logic Downward Monotone | 0.008988968316207744 |
Logic Non-Monotonic | 0.2895702534395041 |
Knowledge Common Sense | 0.45069440991558835 |
Knowledge World Knowledge | 0.5198996752635257 |
Dataset | Speed | RAM |
---|---|---|
LiDiRus | - | - |
RCB | - | - |
PARus | - | - |
MuSeRC | - | - |
TERRa | - | - |
RUSSE | - | - |
RWSD | - | - |
DaNetQA | - | - |
RuCoS | - | - |