39 comments
47 forecasters
What will be state-of-the-art accuracy on the Massive Multitask dataset on the following dates?
This question is closed for predictions, and is waiting to be resolved
Authors:
Opened:Jul 4, 2022
Closed:Jun 29, 2025
Scheduled resolution:Jun 30, 2025
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
20 forecasters
What will be the best score by an AI on the full Humanity's Last Exam (HLE) before 2026?
60.8%
(51.6 - 72.1)
60.8%
(51.6 - 72.1)
47 forecasters
What will the be the state-of-the-art performance on image classification on ImageNet in top-1 accuracy on the following dates?
78 forecasters