Group
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
Make a Prediction
Year
median
PDF
Resolved
No key factors yetAdd some that might influence this forecast.
Add key factor
Forecast Timeline
Total Forecasters 18
Authors:
Opened:Mar 28, 2024
Closes:Jan 1, 2028
Scheduled resolution:Jan 1, 2028
What will be the best score by an AI on the full Humanity's Last Exam (HLE) before 2026?
56.3
What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2030?
97
What will state-of-the-art top-1 accuracy on the APPS Benchmark introductory problems be from 2022 to 2025?
91.5
Authors:
Opened:Mar 28, 2024
Closes:Jan 1, 2028
Scheduled resolution:Jan 1, 2028
What will be the best score by an AI on the full Humanity's Last Exam (HLE) before 2026?
56.3
What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2030?
97
What will state-of-the-art top-1 accuracy on the APPS Benchmark introductory problems be from 2022 to 2025?
91.5