7
55 forecasters
Will any AI model achieve a score of 94% or higher on the GPQA Diamond Benchmark Leaderboard before February 1, 2026?
ResolvedNo
Significant compute resources
Increases Likelihood
Time constraint may hinder testing
Decreases Likelihood