Consider some Metaculus question you know little about. This might be whether the star KIC 9832227 will go "red nova", whether the 2048-bit RSA cryptosystem will be broken before 256-bit Elliptic Curve Cryptography, or whether Piracetam is a more effective Alzheimer's treatment than Memantine. Your guess might be that such a difficult question might well be equally likely to resolve one way as the other.

You can expect the community to do much better. In fact, even though the community predictions are not always perfectly calibrated, you should expect it to predict an approximate average of 63% to those that resolve positively, and 37% to those that resolve negatively [1]. That's how much signal the Metaculus community are able to extract from what might seem like noise. Impressive right?

The Log score is a commonly-used scoring rule, which (relative to the Brier score) gives a larger penalty for being confident (i.e. predicting near 1 or near 99%) but being wrong. Currently (as of 06/11/18), 276 questions have been resolved and the community log score is 0.167. The lower the score, the more precise and well-calibrated the predictions are.

It also seems like Many Are Smarter Than the Few: the community log-score is currently lower (i.e. better) than the average log-score of 0.1694 of the current top 25 predictors in the rankings.

**What will the community average log-score be after the resolution of the 500th question on Metaculus?**

The log-score is computed as follows for a single forecast of probability , if the event occurred, and if not. The scaling is chosen such that it matches the Brier score for a 50% prediction.

[1] This is a back-of-the-envelope estimate. I assume that the success rates are on average 5% removed from predictions made (imperfect calibration). Then, given a log score of 0.167 you would expect if .