Group
What will be the state-of-the-art language modelling performance (in perplexity) on WikiText-103 by the following dates?
34 comments34
103 forecasters
Make a Prediction
This question is closed for predictions, and is waiting to be resolved
Closed for forecasting
Resolved
Forecast Timeline
Authors:
Opened:Dec 14, 2020
Closed:Feb 13, 2025
Scheduled resolution:Dec 13, 2026
What will be state-of-the-art accuracy on the Massive Multitask dataset on the following dates?
94.2
What will state-of-the-art top-1 accuracy on the APPS Benchmark introductory problems be from 2022 to 2025?
91.5
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
92.7
Authors:
Opened:Dec 14, 2020
Closed:Feb 13, 2025
Scheduled resolution:Dec 13, 2026
What will be state-of-the-art accuracy on the Massive Multitask dataset on the following dates?
94.2
What will state-of-the-art top-1 accuracy on the APPS Benchmark introductory problems be from 2022 to 2025?
91.5
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
92.7