Submission HUMAN BENCHMARK



Total score: 0.802

Dataset Score Metric
Diagnostic 0.626 Matthew's Corr
Russian CB 0.68/0.702 F1/Acc
PARus 0.982 Accuracy
MuSeRC 0.806/0.42 F1a/Em
TERRa 0.92 Accuracy
RUSSE 0.747 Accuracy
RWSD 0.84 Accuracy
DaNetQA 0.879 Accuracy
RuCoS 0.93/0.924 F1/EM
Model description:

Human performance on all testsets. All tasks are done in Yandex.Toloka. All instructions and examples of task see in our repo.


Parameter description:

Human performance on all testsets


Diagnostic (Matthew`s Correlation): 0.626

Category Score
LOGIC
KNOWLEDGE
PREDICATE-ARGUMENT STRUCTURE
LEXICAL SEMANTICS
Lexical Semantics - Lexical Entailment
Lexical Semantics - Morphological Negation
Lexical Semantics - Factivity
Lexical Semantics - Symmetry/Collectivity
Lexical Semantics - Redundancy
Lexical Semantics - Named Entities
Lexical Semantics - Quantifiers
Predicate-Argument Structure Core Args
Predicate-Argument Structure Prepositional Phrases
Predicate-Argument Structure Ellipsis/Implicits
Predicate-Argument Structure Anaphora/Coreference
Predicate-Argument Structure Active/Passive
Predicate-Argument Structure Nominalization
Predicate-Argument Structure Genitives/Partitives
Predicate-Argument Structure Datives
Predicate-Argument Structure Relative Clauses
Predicate-Argument Structure Coordination Scopes
Predicate-Argument Structure Intersectivity
Predicate-Argument Structure Restrictivity
Logic Negation
Logic Double Negation
Logic Interval/Numbers
Logic Conjuction
Logic Disjunction
Logic Conditionals
Logic Universal
Logic Existential
Logic Temporal
Logic Upward Monotone
Logic Downward Monotone
Logic Non-Monotonic
Knowledge Common Sense
Knowledge World Knowledge