To evaluate your model on Russian SuperGLUE, collect your system`s predictions on the eight primary and the diagnostic tasks (which generally should use your RTE classifier).
  • Get data for all tasks from the "Tasks" section
  • Use the IDs and labels present in the unlabeled test JSONLs to generate one JSONL of predictions for each of the test files. Each line of the prediction files should be a JSONL entry, with a sorted idx field to identify the example and a `label` field with the prediction.
  • Add the link to your model (it may be code or paper). It is obligatory. We need to verify it, if you want to be on a leaderboard.
  • Make sure that each prediction JSONL is named according to the following:
    • DaNetQA: DaNetQA.jsonl
    • CommitmentBank: RCB.jsonl
    • PARus: PARus.jsonl
    • MuSeRC: MuSeRC.jsonl
    • RuCoS: RuCoS.jsonl
    • TERRa: TERRa.jsonl
    • Words in Context: RUSSE.jsonl
    • Winograd Schema Challenge: RWSD.jsonl
    • Broad Coverage Diagnostics: LiDiRus.jsonl

You may upload at most two submissions a day, and at most ten submissions per month. A sample submission with the necessary formatting is available here.

Submitted systems may use any public or private data when developing their systems with a few exceptions:

  • Systems may only use the Russian SuperGLUE-distributed versions of the SuperGLUE task datasets, as these use different train/validation/test splits from other public versions in some cases, and may omit metadata that is available elsewhere.
  • Systems may not use the example sentences from the WordNet, VerbNet, or Wiktionary resources as structured data during training or evaluation. It is acceptable to use these sentences as isolated sentences of raw text, but using them alongside any contextual information such as sense or synset identity can yield an unfair advantage on the WiC task.
  • Systems may not use the unlabeled test data for the SuperGLUE tasks in system development in any way and may not build systems that share information across separate test examples in any way.

Beyond this you may submit results from any kind of system that is capable of producing labels for the six target and the analysis tasks. This includes systems that do not share any components across tasks or systems not based on machine learning.

System submissions are not automatically public. Once you mark a submission as public, a notification will go to Russian SuperGLUE admins who will then approve your submission. Once approved, you will get a notification about approval and your entry should be on the leaderboard. If you update your entry in future, it will have to go through the process again. Therefore, please make sure that everything is correct when you submit your entry for approval.

We will only make submissions public if they include either a link to a paper or a short text description. In addition, to ensure reasonable credit assignment since SuperGLUE builds very directly on prior work, we ask the authors of submitted systems to directly name and cite the specific datasets that they use, including the SuperGLUE datasets. We will enforce this as a requirement for papers linked from the leaderboard.

Yes, you can. We currently display names for all submissions, but you may create a Google account with a placeholder name if you prefer. Since other users couldn’t question anonymous authors, we require to attach a link to a reasonably detailed (anonymized) paper to every anonymous submission.

The primary SuperGLUE tasks are built on existing datasets or derived from them. In Russian version we have created the equivalents from scratch. All our datasets are published by MIT License.

If you have just submitted, please wait at least 5 minutes for grader to run and grade your submission.

First, in your profile, check if your submission is present. If submissions status is error, hover over the error symbol to see it.

In other cases, when submission is not present check below or contact us.

A submission may not be graded in case of any of the following issues:

  • If the top level directory of the zip does not contain a file for all tasks.
  • If you are missing any example ID for any task. The IDs for each task in the JSONL are incremental and start from 0. Make sure to use the same IDs as in the test JSONLs.

We calculate scores for each of the tasks based on their individual metrics. These scores are then averaged to get the final score. For tasks with multiple metrics, the metrics are averaged. On the leaderboard, only the top scoring submission of a user is shown or ranked by default. Other submissions can be seen under the expanded view for each user. Competitors may submit privately, preventing their results from appearing on the public leaderboard. To make the results be shown on the leaderboard please, click on the “Public” checkbox.

Yes, you can use pretrained jiant model. See how to do it in our jupyter notebook

Contacts us via russiansuperglue@gmail.com