Question
What will be the best perplexity score by a language model on the Penn Treebank (Word Level) by the end of 2024?
Resolved:AnnulledTotal Forecasters18
Community Prediction
19.9
(12.7 - 20.4)
Make a Prediction
CDF
Quartiles | community | My Prediction |
lower 25% | 12.66 | — |
median | 19.95 | — |
upper 75% | 20.37 | — |
What was the final result?Annulled
Authors:
Opened:Sep 24, 2021
Closes:Jan 1, 2025
Resolves:Jan 1, 2025
Spot Scoring Time:Sep 26, 2021
When will a language model be developed that, when tested, yields approximately human-level output?
05 Jun 2024
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
92.7
What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2030?
97
Authors:
Opened:Sep 24, 2021
Closes:Jan 1, 2025
Resolves:Jan 1, 2025
Spot Scoring Time:Sep 26, 2021
When will a language model be developed that, when tested, yields approximately human-level output?
05 Jun 2024
In the following years, what will be the highest LLM scores on the GPQA Diamond benchmark?
92.7
What will be the best non-human SAT-style score on the hard subset of the QuALITY dataset by January 1, 2030?
97