Submission HUMAN BENCHMARK

June 4, 2020, 3:16 p.m.

Team: AGI NLP

Model url: https://github.com/RussianNLP/RussianSuperGLUE/tree/master/HumanBenchmark


Total score: 0.8

Dataset Score Metric
LiDiRus 0.626 Matthew`s Corr
RCB 0.68 / 0.702 F1/Acc
PARus 0.982 Accuracy
MuSeRC 0.806 / 0.42 F1a/Em
TERRa 0.92 Accuracy
RUSSE 0.747 Accuracy
RWSD 0.84 Accuracy
DaNetQA 0.879 Accuracy
RuCoS 0.93 / 0.89 F1/EM
Model description:

Human performance on all testsets. All tasks are done in Yandex.Toloka. All instructions and examples of task see in our repo.


Parameter description:

Human performance on all testsets

Diagnostic (Matthew`s Correlation): 0.626

Category Score
LOGIC
KNOWLEDGE
PREDICATE-ARGUMENT STRUCTURE
LEXICAL SEMANTICS
Lexical Semantics - Lexical Entailment
Lexical Semantics - Morphological Negation
Lexical Semantics - Factivity
Lexical Semantics - Symmetry/Collectivity
Lexical Semantics - Redundancy
Lexical Semantics - Named Entities
Lexical Semantics - Quantifiers
Predicate-Argument Structure Core Args
Predicate-Argument Structure Prepositional Phrases
Predicate-Argument Structure Ellipsis/Implicits
Predicate-Argument Structure Anaphora/Coreference
Predicate-Argument Structure Active/Passive
Predicate-Argument Structure Nominalization
Predicate-Argument Structure Genitives/Partitives
Predicate-Argument Structure Datives
Predicate-Argument Structure Relative Clauses
Predicate-Argument Structure Coordination Scopes
Predicate-Argument Structure Intersectivity
Predicate-Argument Structure Restrictivity
Logic Negation
Logic Double Negation
Logic Interval/Numbers
Logic Conjuction
Logic Disjunction
Logic Conditionals
Logic Universal
Logic Existential
Logic Temporal
Logic Upward Monotone
Logic Downward Monotone
Logic Non-Monotonic
Knowledge Common Sense
Knowledge World Knowledge