This question is part of the Maximum Likelihood Round of the Forecasting AI Progress Tournament. You can view all other questions in this round here.
SuperGLUE (Wang et al., 2019) is a benchmark for evaluating general-purpose language understanding systems. The set of eight tasks in the benchmark emphasises diverse task formats and low-data training data tasks, with nearly half the tasks having fewer than 1k examples and all but one of the tasks having fewer than 10k examples.
As of writing this question, the state-of-the-art model for is T5: Text-To-Text Transfer Transformer (Raffel et al., 2019), which achieves an average score 89.3, just below the human baseline of 89.8
The SuperGLUE leaderboard may be accessed here.