Group
What will be state-of-the-art accuracy on the Massive Multitask dataset on the following dates?
Make a Prediction
This question is closed for predictions, and is waiting to be resolved
Closed for forecasting
Resolved
Forecast Timeline
Authors:
Opened:Jul 4, 2022
Closed:Jun 29, 2025
Scheduled resolution:Jun 30, 2025
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
93.6
What will be the best score by an AI on the full Humanity's Last Exam (HLE) before 2026?
61.8
What will the be the state-of-the-art performance on image classification on ImageNet in top-1 accuracy on the following dates?
91.9
Authors:
Opened:Jul 4, 2022
Closed:Jun 29, 2025
Scheduled resolution:Jun 30, 2025
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
93.6
What will be the best score by an AI on the full Humanity's Last Exam (HLE) before 2026?
61.8
What will the be the state-of-the-art performance on image classification on ImageNet in top-1 accuracy on the following dates?
91.9