Metaculus Track Record
One of the core ideas of Metaculus is that predictors should be held accountable for their predictions. A predictor that consistently makes good predictions should be rewarded, while over-confident predictors should lose standing. In that spirit, we present here a track record of the Metaculus system.
The first of these graphs shows every resolved question, the time it resolved, the resolution, and either the Metaculus or the community score (Brier or Log) for that question. Lower scores are better. The line provides a moving average of the scores over time. The Metaculus postdiction shows what our current algorithm would have predicted if it and its calibration data were available at the question's close.
The second graph is a histogram more clearly showing the distribution of scores, while the third graph breaks the Metaculus and community predictions into bins and shows how well calibrated each bin is. For example, if a perfectly calibrated predictor predicts 80% on 10 different questions, then we expect that 8 out of those 10 would resolve positively. In contrast, an over-confident predictor might only have 6 out of 10 resolve positively, while an under-confident predictor might have all 10 resolve positively. An ideal predictor would only predict with absolute certainty and would be correct every time, and so wouldn't have any data at all for the center of the graph. The range of each bar shows the approximate 1σ confidence interval of each bin's calibration.
These graphs are interactive. You can click on individual data points to see which question they refer to, and you can click on the different calibration bins to highlight the data points. You can also filter by date and category to see the track record for a subset of questions.
Note: The Metaculus prediction, as distinct from the community prediction, wasn't developed until June 2017. At that time, all closed but unresolved questions were given a Metaculus prediction. After that, the Metaculus prediction was only updated for open questions.
For each question, the Metaculus postdiction uses data from all other questions to calibrate its result, even questions that resolved later.