Your submission is now a Draft.

Once it's ready, please submit your draft for review by our team of Community Moderators. Thank you!

You have been invited to co-author this question.

When it is ready, the author will submit it for review by Community Moderators. Thanks for helping!


This question now needs to be reviewed by Community Moderators.

We have high standards for question quality. We also favor questions on our core topic areas or that we otherwise judge valuable. We may not publish questions that are not a good fit.

If your question has not received attention within a week, or is otherwise pressing, you may request review by tagging @moderators in a comment.

You have been invited to co-author this question.

It now needs to be approved by Community Moderators. Thanks for helping!


{{qctrl.question.predictionCount() | abbrNumber}} predictions
{{"myPredictionLabel" | translate}}:  
{{ qctrl.question.resolutionString() }}
{{qctrl.question.predictionCount() | abbrNumber}} predictions
My score: {{qctrl.question.player_log_score | logScorePrecision}}
Created by: MetaculusOutlooks and
co-authors , {{coauthor.username}}
Forecasting AI Progress

Make a Prediction


This question is part of the Maximum Likelihood Round of the Forecasting AI Progress Tournament. You can view all other questions in this round here.

SuperGLUE (Wang et al., 2019) is a benchmark for evaluating general-purpose language understanding systems. The set of eight tasks in the benchmark emphasises diverse task formats and low-data training data tasks, with nearly half the tasks having fewer than 1k examples and all but one of the tasks having fewer than 10k examples.

As of writing this question, the state-of-the-art model for is T5: Text-To-Text Transfer Transformer (Raffel et al., 2019), which achieves an average score 89.3, just below the human baseline of 89.8

The SuperGLUE leaderboard may be accessed here.