Question

What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2030?

1 comment1

Total Forecasters11

Community Prediction

97%

(93.1% - 98.9%)

Make a Prediction

PDF

CDF

	lower 25%	median	upper 75%
community	93.06%	96.99%	98.85%
My Prediction	—	—	—

Quartiles	community	My Prediction
lower 25%	93.06%	—
median	96.99%	—
upper 75%	98.85%	—

No key factors yetAdd some that might influence this forecast.

Add key factor

Authors:

elifland

Opened:

May 27, 2022

Closes:

Jan 2, 2030

Scheduled resolution:

Jan 2, 2030

Spot Scoring Time:

May 29, 2022

AI Technical Benchmarks

Computing and Math

Artificial Intelligence

What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2040?

98.2

What will be the best score by an AI on the full Humanity's Last Exam (HLE) before 2026?

47.5

In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?

92.7

Authors:

elifland

Opened:

May 27, 2022

Closes:

Jan 2, 2030

Scheduled resolution:

Jan 2, 2030

Spot Scoring Time:

May 29, 2022

AI Technical Benchmarks

Computing and Math

Artificial Intelligence

What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2040?

98.2

What will be the best score by an AI on the full Humanity's Last Exam (HLE) before 2026?

47.5

In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?

92.7