Submission HUMAN BENCHMARK

Nov. 12, 2020, 11:57 a.m.

Team: AGI NLP

Model url: https://github.com/RussianNLP/RussianSuperGLUE


Total score: 0.811

Dataset Score Metric
LiDiRus 0.626 Matthew`s Corr
RCB 0.68 / 0.702 F1/Acc
PARus 0.982 Accuracy
MuSeRC 0.806 / 0.42 F1a/Em
TERRa 0.92 Accuracy
RUSSE 0.805 Accuracy
RWSD 0.84 Accuracy
DaNetQA 0.915 Accuracy
RuCoS 0.93 / 0.89 F1/EM
Model description:

HUMAN BENCHMARK for version 1.2


Parameter description:

Diagnostic (Matthew`s Correlation): 0.626

Category Score
LOGIC
KNOWLEDGE
PREDICATE-ARGUMENT STRUCTURE
LEXICAL SEMANTICS
Lexical Semantics - Lexical Entailment
Lexical Semantics - Morphological Negation
Lexical Semantics - Factivity
Lexical Semantics - Symmetry/Collectivity
Lexical Semantics - Redundancy
Lexical Semantics - Named Entities
Lexical Semantics - Quantifiers
Predicate-Argument Structure Core Args
Predicate-Argument Structure Prepositional Phrases
Predicate-Argument Structure Ellipsis/Implicits
Predicate-Argument Structure Anaphora/Coreference
Predicate-Argument Structure Active/Passive
Predicate-Argument Structure Nominalization
Predicate-Argument Structure Genitives/Partitives
Predicate-Argument Structure Datives
Predicate-Argument Structure Relative Clauses
Predicate-Argument Structure Coordination Scopes
Predicate-Argument Structure Intersectivity
Predicate-Argument Structure Restrictivity
Logic Negation
Logic Double Negation
Logic Interval/Numbers
Logic Conjuction
Logic Disjunction
Logic Conditionals
Logic Universal
Logic Existential
Logic Temporal
Logic Upward Monotone
Logic Downward Monotone
Logic Non-Monotonic
Knowledge Common Sense
Knowledge World Knowledge

Performance:

Dataset Speed RAM
LiDiRus - -
RCB - -
PARus - -
MuSeRC - -
TERRa - -
RUSSE - -
RWSD - -
DaNetQA - -
RuCoS - -