M

Your Question Group is now a Draft.

Once it's ready, please submit your draft for review by our team of Community Moderators. Thank you!

You have been invited to co-author this Question Group.

When it is ready, the author will submit it for review by Community Moderators. Thanks for helping!

Pending

This Question Group now needs to be approved by community moderators.

We have high standards for question quality. We also favor questions on our core topic areas or that we otherwise judge valuable. We may not publish questions that are not a good fit.

If your question has not received attention within a week, or is otherwise pressing, you may request review by tagging @moderators in a comment.

You have been invited to co-author this Question Group.

It now needs to be approved by community moderators. Thanks for helping!

{{qctrl.question.title}}

{{qctrl.question.predictionCount() | abbrNumber}} total predictions
{{ qctrl.question.resolutionString() }}
{{qctrl.question.predictionCount() | abbrNumber}} total predictions
My average score: {{qctrl.question.playerLogScoreOnSubquestions() | logScorePrecision}}
Created by: jacob.steinhardt and
co-authors , {{coauthor.username}}
AI Technical Benchmarks
Fan Graph currently only supports Question Groups with continuous numeric sub questions! Unsupported configuration detected, you are on your own.

Make a Prediction

Prediction

The Massive Multitask Language Understanding (MMLU) dataset is a dataset of high school, college, and professional multiple choice exams that test expert subject knowledge. It was constructed by Hendrycks et al. (2021). Hypermind forecasters were commissioned to predict state-of-the-art performance on June 30, 2022, '23, '24, and '25. The 2022 result of 67.5% was significantly outside forecasters' prediction intervals, so we're seeing what the updated forecasts are for 2023, '24, and '25.