Your submission is now a Draft.

Once it's ready, please submit your draft for review by our team of Community Moderators. Thank you!

You have been invited to co-author this question.

When it is ready, the author will submit it for review by Community Moderators. Thanks for helping!


This question now needs to be reviewed by Community Moderators.

We have high standards for question quality. We also favor questions on our core topic areas or that we otherwise judge valuable. We may not publish questions that are not a good fit.

If your question has not received attention within a week, or is otherwise pressing, you may request review by tagging @moderators in a comment.

You have been invited to co-author this question.

It now needs to be approved by Community Moderators. Thanks for helping!


{{qctrl.question.predictionCount() | abbrNumber}} predictions
{{"myPredictionLabel" | translate}}:  
{{ qctrl.question.resolutionString() }}
{{qctrl.question.predictionCount() | abbrNumber}} predictions
My score: {{qctrl.question.player_log_score | logScorePrecision}}
Created by: isinlor and
co-authors , {{coauthor.username}}
AI Demonstrations

Make a Prediction


Recently, Hendrycks et. al. proposed a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary and collage level mathematics, computer science, law, accounting and more. For each task, the model is provided only 5 training examples. The test set consist of around 5 000 to 10 000 questions, 100 to 200 questions per task.

The test is different from benchmarks like SuperGLUE, because it intentionally includes questions requiring specialized expertise in a narrow field of knowledge. Many tasks will be difficult for an average human. See example questions below.

They found that the very largest GPT-3 model achieves 43.9% accuracy vs. 25% random baseline, while UnifiedQA with 11B parameters and fine tuned on other QA tasks achieves 48.9%. Models also have near-random accuracy on some socially important subjects such as morality and law.

The question asks: