5 comments
21 forecasters
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
Authors:
Opened:Mar 28, 2024
Closes:Jan 1, 2028
Scheduled resolution:Jan 1, 2028
What will be the best score by an AI on the full Humanity's Last Exam (HLE) before 2026?
50.1%
(39 - 61.1)
50.1%
(39 - 61.1)
60 forecasters
What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2030?
96.9%
(92.6 - 98.8)
96.9%
(92.6 - 98.8)
11 forecasters
What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2040?
99.1%
(96.3 - 99.7)
99.1%
(96.3 - 99.7)
22 forecasters