# New Metaculus Tournament Scoring System Pt. 1

We have some big news: we’re massively overhauling our tournament scoring system. Why this change? We’ll get into all the details below, but at a high level it’s because we’re striving to create a fairer system that better rewards forecaster skill. Getting the scoring system right is crucial for forecaster motivation, and ultimately we want to be the absolutely best platform for forecasters. We believe this will ultimately lead to even more accurate forecasts with terrific commentary, making Metaculus the best place to learn what is going on in the world while helping the world make better decisions.

This post explains how the new Metaculus tournament scoring rule works via a simple example. Our new system builds on the incentive-compatible Kelly betting rule that was created in April 2021 and was used to calculate the leaderboard in the recently completed Trade Signal Tournament. Our new system has the same basic structure with some additional features to better reward the best forecasters.

Note that the details provided below are also provided via this video explainer.

The old tournament scoring system had a weakness: a simple bot that copies the community prediction on every question ends up doing quite well and winning real money, while providing no value. In the Trade Signal Tournament, such a bot would have finished in first place and won around 15% of the prize pool. More alarmingly, if someone made such a bot (which we believe current users can easily do) and shared that bot with 10 friends, those 10 bots would almost certainly consume the majority of the prize pool—again, while providing no value.

Our updated tournament scoring system aims to close that loophole. The community median forecast is hidden for a period of time at the start of a question, so forecasters can’t piggy-back on the work of others. A simple bot that forecasts an ignorance prior when the community median is hidden (like 50% on a binary question) will score quite poorly, allowing the best forecasters to get a sizable scoring lead with their skill during the hidden period. The length of time that the median is hidden, and the weight given to forecasts during the hidden period, may vary from question to question and from tournament to tournament. We plan to experiment to see what works best and to tailor the tournament parameters to match the subject matter of a given project.

If Metaculus’s only goal were to reward forecasting accuracy then we could hide the community median at all times. But, Metaculus also aims to help our partners and the world make better decisions based on the most accurate forecasts. We believe that the new tournament scoring system balances these competing interests, rewarding individual skill while also providing a public service.

Our scoring system is a work in progress. As we run more tournaments, we will continue to adjust the tournament scoring framework and parameters as we learn what works best. We’re always eager for feedback, small suggestions, and bold new ideas. We hope to hear from both forecasters and forecast consumers.

### A Simple Scenario

Let’s explore a small concrete example to illustrate the ideas behind our scoring approach. Consider a simple tournament with the following assumptions:

### Battling the Bot

First, it is worth noting that the original Metaculus tournament scoring system shows the community median at all times. Had that been the case in our example above, the bot would have received the same score (s) = 0, but its coverage would be around twice as big so it would win around twice as much.

For the recently completed Trade Signal Tournament, as the community median is visible at all times, the bot would have finished 1st place and won around 15% of the prize pool, so this is not just a theoretical exercise. Simply put, the median forecast combines the wisdom of many individuals and is a difficult benchmark to outperform, which is exactly what Metaculus is trying to achieve! A simple bot can take advantage of this powerful signal.

By hiding the median for half of the time in our example above, we have already reduced the bot’s prize. Is there more that we can do? There is, and that is the motivation behind the introduction of the coverage weight (c_weight) and score weight (s_weight). Let’s explore how tuning those parameters affects the bot.

The table below shows two parameters, s_weight (score weight) and c_weight (coverage weight), that are chosen by the tournament organizer. The s_weight parameter determines the daily weights used when we calculate scores for each question. In the example above we set s_weight = 50%. This means that 50% of a forecaster’s score is determined by their forecasts when the median is hidden. If we set s_weight =100%, then only the forecasts made while the median is hidden would factor into a forecaster’s score. Similarly, c_weight determines how much of a forecaster’s coverage is determined by the period when the median is hidden.

In the example above all days were equally weighted for the question score and question coverage. However, if we set c_weight = 100% then a forecaster’s coverage would be entirely determined by the period when the median is hidden. In this case, the simple bot above will get 0 coverage since it never forecasts when the median is hidden. And, it is impossible to win any prize money with 0 coverage. So, setting c_weight = 100% ensures that the simple bot above would not win any money! Here is what our example leaderboard looks like when we set c_weight = 100%.

Of course, a more sophisticated bot might adapt to our new scoring system by making a 50% forecast on binary questions when the median is hidden (or a Gaussian pdf centered in the range of a numeric question) in order to boost its coverage, and then copy the median once it is visible. While this bot would indeed win some prize money, we believe that good forecasters should be able to get a sufficient scoring lead during the hidden period to rise above the bot.

In fact, we’ve done some preliminary analysis using data from the recently completed VA Lightning Round and Trade Signal tournaments to estimate how a more sophisticated bot would have done. Our analysis shows that by hiding the median for 20% to 30% of each question’s duration and setting c_weight=100%, the bot would have finished below the 50th percentile in those tournaments and earned very little prize money. (In this analysis, we kept the daily score weight equal for all days.)

We plan to adjust the parameters as we learn more about what empirically works the best, continually experimenting and continually improving. We have some more ideas in our development queue that we hope to introduce in the coming months, including forecasting teams and assigning different weights to different questions. These ideas will be explained in an upcoming Part 2 of this post.

Readers are encouraged to play around with our simple model to see the trade-offs of different parameter settings by downloading the spreadsheet here. Any number in blue can be edited by the user to see how the prizes change. Numbers in black or other colors are usually calculations and should not be edited.

We hope that this post helps the community understand how the new Metaculus tournament scoring system works. In the example above, we simplified by using only 1 forecast per day. In reality, forecasters can join a question or change their forecasts at any time, so the log scores and coverage are calculated by integrating over time (rather than summing over days). The concept is exactly the same, but the implementation is slightly more complex than our example. A real tournament will have more questions and more forecasters than our simple example, but it will use the same aggregation explained above. Also, in practice questions will have different hidden periods and overall durations.

We look forward to hearing your feedback and questions in the discussion below!

