- The Keep Virginia Safe Tournament was a joint effort between Metaculus and the Virginia Department of Health (VDH)
- Questions were developed collaboratively between the partners in order to obtain insights that would maximize usefulness to policy makers
- Questions spanned 4 focus areas, centered around issues most relevant to Virginia’s pandemic response
- Dates: April 29 2021 until April 30 2022.
- 3 tournament rounds (starting April 29 2021, August 3 2021, November 24 2021)
- There were 86 questions, 224 forecasters, and 14937 forecasts
- 80 questions are already closed, and 6 long-term questions will close after the tournament has ended
- Prize pool of $2,500
The Keep Virginia Safe Tournament was a joint effort between Metaculus and the Virginia Department of Health, and represents a first-of-its-kind collaboration between a forecasting platform and a public health agency. The goal of this joint effort was to harness the power of aggregated crowd prediction in order to help and to support public health officials at the Virginia Department of Health. In order to address the most pressing operational needs and decisions of the pandemic response effort, Metaculus collaborated directly with leaders of key VDH divisions, including the Office of Epidemiology, the Office of Health Equity, the COVID-19 Task Force Planning Division, as well as the Community Mitigation, Vaccine, and Testing teams. The Metaculus team also worked closely with the University of Virginia’s Biocomplexity Institute, which provided quantitative modeling and research support to VDH throughout the pandemic. The main point of contact for the collaboration from the Virginia Department of Health was the Director of the Division of Social Epidemiology, Justin Crow, who was assigned to coordinate modeling and foresight activities during the COVID-19 pandemic.
On April 29th, 2021, the Keep Virginia Safe Tournament was officially launched on Metaculus. Overall, the tournament included a total of 14937 forecasts , made by 224 forecasters on 86 questions. Forecasters on Metaculus were incentivized to continuously update their predictions and report their best possible forecasts reflecting the current situation. This ensured that the forecasts available to VDH always represented the best available knowledge at the time.
Combining subject matter expertise with forecasting experience
Forecasting questions for the tournament were developed and continuously reevaluated and adapted in direct collaboration between VDH and Metaculus, combining rich forecasting experience with in-depth subject matter expertise to maximize the usefulness of the forecasts produced. This close cooperation and shared question development process made it possible to quickly launch useful new questions and to respond to changes in the overall situation and current needs.
To achieve this, we pioneered a completely new approach to question development. First, VDH held a series of focused discussions with several COVID-19 response teams to define priorities and key uncertainties with respect to those priorities. Next, Metaculus suggested dozens of question ideas that could be readily operationalized into forecastable questions. VDH then ranked these according to what they thought would be most helpful, added their own suggestions, and made edits to existing suggestions. Lastly, Metaculus proceeded to finalize and fully operationalize the questions that VDH indicated would be most useful and launched them on the platform. This approach minimized the overall amount of time spent on the question development process by streamlining it so that tasks were assigned that played to the strengths of each side — Metaculus took the lead on suggesting question ideas feasible for forecasting and then turning approved question ideas into forecasting questions, while VDH applied their subject matter expertise in public health to outline what their key priorities/uncertainties were from their pubic health standpoint.
From insight to action
Metaculus forecasts were reported during department planning meetings, and were shared widely during statewide virtual meetings with Local Health Department staff, statewide epidemiologist seminars, and during “partner calls” with external stakeholders. On at least one critical occasion, forecasts were shared with the governor. They were also shared on public-facing weekly reports and blog posts. Most reporting was ad hoc, with VDH staff pulling relevant forecasts to inform on topics under discussion and to support situational awareness. Forecasts were most often reported alongside other information, including surveillance updates, administrative program data, and more traditional quantitative modeling performed by the University of Virginia Biocomplexity Institute.
Facing uncertainty - the situation in Virginia
At the start of the Keep Virginia Safe Tournament, public health officials at the Virginia Department of Health were facing key decisions for the summer and fall of 2021. In particular, the timing and magnitude of a peak were of great concern in order to ensure that the state had enough hospital capacity. Uncertainties stemmed from the potential seasonality, including the draw of summer activities, the timing and speed of Virginia’s vaccine rollout and vaccine acceptance/hesitancy of population groups within Virginia, workplace attendance and various community mitigation policies.
Forecasts also supported design and resource planning (including staffing decisions) for various programs including vaccinations, contact tracing, and testing. In addition, VDH needed to develop guidance for possible school reopenings in the fall. With a $250M budget allocated to testing within schools, it was critical to determine how to operationalize the K-12 screening plan and how to best allocate resources across the state. To support these decisions, we developed a series of forecasts estimating how many school districts would participate in testing programs, the likelihood and timing of FDA approval of a vaccine for children under 12, which regions in the state were likely to be “in surge” throughout the summer and fall, and more. Then, when the Omicron variant emerged, Metaculus provided VDH with forecasts on the timing and magnitude of the peak in cases, hospitalizations, and deaths for the Omicron-driven wave.
The forecasts generated by Metaculus are particularly suited to the needs of public policy makers such as those in the Virginia Department of Health. Metaculus provides full predictive distributions, meaning that policy makers do not only obtain a single best guess, but instead an exact quantification of uncertainty in outcomes. This enables them to get a better picture of possible outcomes and especially the likelihood of extreme events. Forecasters continuously update their forecasts based on new information, meaning that predictions always represent the currently available information. Human forecasters on Metaculus also have several advantages over computer-based modeling in the context of informing public health policy in realtime. They are able to provide an early information signal as they are able to incorporate incomplete data and qualitative information such as changes in trends that are hard to capture in numbers. Many forecasters explain their reasoning in the comment section of the Metaculus website, providing additional information to policy makers. Compared to mathematical models, human forecasters are also more flexible and can quickly answer new questions when they arise without requiring time to develop and tune models. The machine-learning-optimized Metaculus algorithm aggregates individual predictions in a way to provide the most accurate forecasts.
Forecasting targets were chosen to maximize the usefulness of the forecasts to policy makers. Together, VDH and Metaculus identified four key areas of questions:
1) The COVID-19 Epidemiological Trajectory
Information on the future trajectory of COVID-19 is central to a variety of aspects of Virginia’s pandemic response. Forecasting questions were designed to help answer questions such as: Are existing measures and resources sufficient? Can schools be opened safely, and if so would that require measures such as mask use or increased testing efforts? How likely is it that future variants that escape immunity will emerge and what preparations need to be made? Is it safe to plan gatherings for holidays?
2) The Path to Population Immunity
Forecasts about population immunity helped VDH navigate and anticipate the impact of the still new vaccination campaign. They helped answer questions like: Will vaccination succeed in curbing infections enough to suppress transmission? Which vaccine distribution channels would have the most demand, and how would that change as the campaign matured? How would vaccine uptake vary? Which groups (e.g., age, race, ethnicity) would reach vaccination benchmarks first, and which would require more effort?
3) Return to Normal
Forecasts about when and how Virginia would return to normal provided insight into the changing social, political, and economic effects of the pandemic. They helped to track the public’s reaction to the virus and the response, and tested assumptions about the “end game” of the pandemic. Relevant questions included: How will demand for testing strategies need to be adapted to accommodate return to school and work? Will pandemic response efforts successfully facilitate a return to normalcy, and will the public trust that the pandemic has been successfully mitigated? If it is not mitigated, will there be political and social will to continue to comply with public health guidance?
4) Health Equity
The COVID-19 pandemic has hit different parts of Virginia’s population differently. Forecasts about health equity helped the Virginia Department of Health mitigate the effects of COVID-19 on the most vulnerable. Relevant policy questions were, for example: Will efforts to close equity gaps be successful, and among which populations? Which populations may need additional effort or focus? Will additional resources be necessary to reduce health inequality? Are current outreach policies working, or do they need to be adapted?
Number of questions, forecasts and forecasters
Throughout the tournament, there were 86 questions, 11 of them in a binary format (asking for a yes or no outcome) and 75 of them asking for a date or a specific quantity (such as e.g. hospitalizations).
In total, there were 224 forecasters, and 14937 forecasts, meaning that on average, every forecaster made 66.7 predictions. Many forecasters made over 100 forecasts and the most active over 1000. Most questions received well over one hundred forecasts (see Figure 1). Out of 86 questions, 56 questions have currently resolved within the bounds of the original question. 32 of these asked for a non-date quantity, 13 asked for a date, and 11 for a binary forecast.
Continuous (non-date) questions
The median of the final forecast was within 10 percent of the observed value for 19 out of 30 questions, and within 50 percent of the observed value for 27 questions.
The final forecast was within 7 days of the observed date for 7 out of 13 questions, and within 14 days for 9 questions. All differences between predicted and actual dates are smaller than (or equal to) zero, meaning that events tended to happen earlier than on average predicted.
Forecasters assigned more than 50% probability to the outcome that was eventually observed on 5 out of the 7 questions that have already resolved. The average Brier score of the Metaculus prediction was 0.15.
Forecasts were continuously updated. Peaks in activity occurred whenever new tournament rounds were launched.
Figure 2: Distribution of errors for continuous questions. A: Accuracy of the continuous (non-date) forecasts compared to the observed values. The percentage error indicates how much larger or smaller the forecast was in percentage terms relative to the corresponding observed value. B: Difference between the median final forecast and the observed date for date questions.
Non-date (discrete or continuous) questions asked for a quantity such as the percentage of Virginia’s population that is vaccinated at a given time or the number of communities that will experience high community transmission. Out of the 30 of these questions that have been resolved, the median of the final forecast was within 10 percent of the observed value for 19 and within 50 percent of the observed value for 27 questions. Table 1 shows the questions for which the median of the final forecast was more than 50 percent away from the observed value. While these numbers have to be interpreted with a certain caution — for example forecasts were updated over time and questions varied in difficulty — they do give a good intuition for the overall accuracy of the Metaculus forecasts. Table 1 shows the questions for which Metaculus forecasts were least accurate, i.e. questions for which the median final forecasts was more than 50 percent higher or lower than the observed value.
Table 1: Continuous question for which the median of the final forecast was more than 50 percent off compared to the actually observed value.
Date questions asked for the exact time of an event, for example "When will the CDC eliminate quarantine restrictions for close contacts of COVID-19 cases?" or "When will a SARS-CoV- 2 vaccine be granted emergency use authorization by the US FDA for children under 12 years old?". Out of 13 of these date questions, the final forecast was within 7 days of the observed date for 7 and within 14 days of the observed date for 9 questions. Again numbers need to be interpreted with care, but overall Metaculus forecasts seemed to be reasonably accurate for most questions. Interestingly, all differences between predicted and actual dates are smaller than (or equal to) zero, meaning that events tended to happen earlier than they were on average predicted. Table 2 gives an overview of the questions for which Metaculus forecasts were least accurate, showing questions for which final forecasts were more than 14 days away from the observed date.
Table 2: Date question for which final forecasts were more than 14 days off compared to the actually observed date.
Binary questions asked for a probability of a given event, for example "Will Virginia announce a vaccine mandate for its state workforce before 1 October 2021"? Out of 11 binary questions, 7 questions have already resolved. Forecasters had assigned more than 50% probability to the outcome that was eventually observed on 5 questions out of 7 questions. For 2, however, forecasters assigned less than 50% probability to the outcome that was finally observed (see Table 3).
The quality of binary forecasts can easily be evaluated using the Brier score, a scoring rule which ensures that forecasters cannot win points by 'cheating' and are incentivized to report their true best belief. The Brier score is measured as the squared distance between prediction and outcome (e.g. for a prediction of 80% probability for an event that happens, the Brier score would be (1−0.8)2. The average Brier score of the community prediction was 0.15, where 0 is perfect omniscience, 1 is the worst possible score and 0.25 is the score assigned to a forecaster who doesn’t know anything and always predicts a probability of 0.5. From a score of 0.15 we can infer that the Metaculus forecasts performed reasonably well and clearly outperformed the agnostic baseline of an unknowing forecaster.
Table 3: Questions for which forecasters assigned more probability to the outcome that was not observed.
In addition to accuracy, calibration is an important feature of good forecasts. Good calibration implies that forecasters are able to correctly assess their own uncertainty, instead of being overly cautious or assigning high probability to events that do not occur. One intuitive way to assess calibration is to ask "how often were forecasters right for a given level of certainty (probability assigned to an outcome)"? For binary questions, certainty is simply the predicted probability. For quantities and date questions, uncertainty is represented by the width of the predictive distribution. The most intuitive way to assess it is by looking at the central 50% prediction intervals, i.e. the range of possible outcomes for which forecasters believe that there is a 50% probability that the observed value will fall within that range. For the 43 date and non-date questions combined, the observed value was within the 50% prediction intervals of the Metaculus prediction 21 times (48.8% of times), indicating that forecasts were well calibrated and provided a reasonable quantification of uncertainty. For the binary questions, assessing calibration is harder, as there are only 7 resolved questions. For these questions, forecasters were never wrong when they expressed more than 70% certainty that an event would either happen or not happen.
While the final predictions are important, a lot of the value of the forecasts comes from the signal they provide over time. Forecasts for all questions were continuously updated as new information became available, providing decision makers at VDH with the best available knowledge in real time. Figure 3A shows a timeline of forecasting activity with continuous engagement over the whole period and spikes in activity when new questions were released. On average (but not always), updating should mean that forecasts also improve over time. This is indeed true in general for the Metaculus forecasts, although not for every individual question. Panels B-D in Figure 3 show the evolution over time for the Brier score (for binary questions), the percentage error (non-date questions) and the absolute error in days (date questions) of the median forecast.
Figure 3: Forecast updates over time. A: Brier score of binary questions. A brier score of 0.25 (dashed line) represents the score for an agnostic forecaster who always predicts a probability of 0.5. B: Difference between median predicted date and observed date for date questions over time. C: Percentage difference between median predicted value and observed value over time. D: Histogram with timings of forecasts.
The Keep Virginia Safe Tournament was a first-of-its-kind collaboration between a prediction platform and a public health agency. It helped turn insight into action by making distributed forecasting expertise available to decision makers at the Virginia Department of Health. Importantly, VDH was not only a forecast consumer, but was able to set key priorities for forecast elicitation and was actively involved in the question development process. This setup not only allowed for a timely provision of the most relevant and up-to-date forecasts, but it also led to a high level of satisfaction from both sides. The collaboration was able to successfully combine deep subject matter expertise with forecasting experience to deliver actionable insights.
Anchoring Metaculus forecasts among more traditional information sources, including surveillance reporting and quantitative models, improved acceptance and allowed busy decision makers to gain some familiarity with Metaculus products. In this context, Metaculus forecasts acted as useful benchmarks and filled gaps. For instance, VDH tracked bed capacity in hospitals, and UVA Biocomplexity Institute models projected bed use. However, there was little information available on staffing, a major limitation as the Omicron wave approached. Metaculus forecasts on average travel nurse salaries in Virginia filled the gap, contributing to key decisions on hospital and workforce flexibility. In addition to Virginia-specific questions, VDH staff were able to pull from the wide array of forecasts available on the Metaculus platform.
Areas for improvement
The Keep Virginia Safe Tournament was a pilot project implemented during the height of a fast-moving public health emergency. While there were some notable successes, there are still lessons to be learned and areas for improvement. While some questions proved to be useful for policy discussions, few were linked directly to policy decisions. Further work and research is needed in this area.
Communication of Metaculus forecasts to lay audiences could be challenging at times, especially if forecasts were multi-modal, there were significant “off-scale” forecasts, or forecasts did not translate well to simple numeric presentation. Forecasts were sometimes refigured to create consumable results. For instance, questions about which month would experience peaks in cases and hospitalizations were converted into probabilities of whether “the worst was behind us”. (Stale forecasts could make this fraught, as they often included probabilities covering multiple past months, when only one past month could possibly be the peak.) Linking Metaculus forecasts to policy deci- sions and improving communications are key goals of future tournaments.
Areas for further collaboration
VDH and Metaculus are planning a second tournament, addressing the areas of improvement noted above. COVID-19 and its effects will necessarily be featured, along with its long-term physical and mental health impacts. Areas outside of COVID-19, such as infant mortality, substance use disorder, and sexually transmitted infections will also be addressed.
VDH has also launched a new Foresight and Analytics unit within its Office of Emergency Preparedness. Over the long term, Metaculus forecasts may contribute to threat assessments and benchmarking. Short-term questions or tournaments informing on emerging threats are also a key area for collaboration. Metaculus, the UVA Biocomplexity Institute, and VDH have also pursued an innovative collaboration combining the benefits of traditional quantitative models and aggregate human forecasts. Quantitative models are highly dependent on assumptions for key parameters such as the characteristics of new variants, behavioral or policy responses, vaccine effectiveness, or waning immunity. Parameters for uncertain quantities are often selected by the modelers themselves. These may be used in scenario projections, or for making forecasts about future quantities. This collaboration is exploring using Metaculus forecasts to select parameters for quantitative models. These can be used to better define potential scenario arrays, select the most likely scenario among an array, or to create a Metaculus-informed forecasting model. We are certain to find other areas to explore, and look forward to further collaboration.
We are grateful to the contributions of Caroline Holsinger and Justin Crow of the Virginia Department of Health who made this collaboration a success. We also would like to thank Gaia Dempsey, Tom Liptay, and Juan Cambeiro for making the tournament possible from Metaculus’s end and Nikos Bosse for writing this report.
Example questions for the four core key areas
The COVID-19 Epidemiological Trajectory
- Will variants of concerns thought to partially escape immunity make up more than 50% of samples sequenced in Virginia on 29 August - 11 September 2021?
- How many new COVID-19 outbreaks will occur in Virginian long-term care facilities before 1 August 2022?
- In Virginia, which month between May 2021 and March 2022 (inclusive) will have the highest number of new COVID-19 hospitalizations?
- When will Virginia’s weekly total of new confirmed and probable COVID-19 hospitalizations fall below 49?
- How many of Virginia’s 133 communities will be experiencing moderate or higher levels of community transmission as of 7 March 2022?
The Path to Population Immunity
- What will the percent of Virginia’s population vaccinated with at least one dose be on 1 August 2021?
- When will the percent of Virginia’s population vaccinated with at least one dose reach 75%?
- Which age group will have the highest share of Virginia’s new COVID-19 cases during the week ending 1 August 2021?
Return to Normal
- According to the Virginia Department of Education, how many Virginia school divisions will have “In Person” instructional schedules as of 15 September 2021?
- When will the CDC eliminate quarantine restrictions for close contacts of COVID-19 cases?
- When will Virginia’s 6-foot distancing requirement for food and beverage establishments be lifted?
- When will a SARS-CoV-2 vaccine be granted emergency use authorization by the U.S. FDA for children under 12 years old?
- What will Virginia’s percent unemployment rate be in April 2022?
- What will the cumulative vaccination rate ratio for Black Virginians be in July 2021?
- What will the cumulative vaccination rate ratio for Hispanic Virginians be in July 2021?