Name | Identifier | Type of the task | Metrics | License | Download | HB score |
---|---|---|---|---|---|---|
Russian DaNetQA | DaNetQA | Binary Classification | Accuracy | MIT License | 0.915 |
Natural Language Inference
DaNetQA is a question answering dataset for yes/no questions. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings.
Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.
By sampling questions from a distribution of information-seeking queries (rather than prompting annotators for text pairs), we observe significantly more challenging examples compared to existing NLI datasets.
Logic, Commonsense, World knowledge. Binary Classification: true/false
{
"text": "В период с 1969 по 1972 год по программе «Аполлон» было выполнено 6 полётов с посадкой на Луне. Всего на Луне высаживались 12 астронавтов США. Список космонавтов Список космонавтов — участников орбитальных космических полётов Список астронавтов США — участников орбитальных космических полётов Список космонавтов СССР и России — участников космических полётов Список женщин-космонавтов Список космонавтов, посещавших МКС Энциклопедия астронавтики.",
"question": "Был ли человек на луне?",
"label": true,
"idx": 5
}
All text examples were collected in accordance with the methodology for collecting the original dataset. Answers to the questions were received with the help of assessors, and texts were also received automatically using ODQA systems on Wikipedia. Human assessment was carried out on Yandex.Toloka.
*Additionally, to increase number of samples and the distribution of yes/no answers, we added extra data in the same format (data were collected from Yandex.Toloka while generating MuSeRC dataset).
English DaNetQA - Accuracy 91.2%