M

Prediction Resources

Analysis tools

  • Guesstimate: a simple web-based tool to model uncertainties in calculations. Guesstimate's interface is similar to other spreadsheet tools, such as Excel or Google Sheets. Each model is a grid of cells, and each cell can be filled with a name and value. Functions can be used to connect cells together to represent more complex quantities.

    For example, consider the question series about the Fermi paradox. We may use the Drake equation (a "back of the envelope" estimation to find out if there is intelligent life in the Milky Way other than us humans) to estimate the number of intelligent civilizations in our milky verse based on 7 different variables (seedrake equation). Each guess has its own uncertainties, and with Guesstimate you can multiply the guesses and their uncertainties together to get a probability distribution of the number of intelligent civilizations. See the following model by a Guesstimate user on this probability. Also check out public models, and don't forget to post your models in the comments of questions for others to see!

  • Spreadsheets such as Excel or Google Sheets for both theoretical modelling and basic statistical analysis. Spreadsheets offer similar options to Guesstimate, as you can create theoretical models to factorize questions, produce estimates for subquestions, and run basic Monte Carlo simulations (see here for an example of such simulation). Secondly, basic statistical analysis (descriptive statistics, correlations, regressions and so on) is convenient in Excel (see here for more information). Finally, spreadsheets created on Google Sheets can also be shared in the comments, to allow others to view your work.

  • Statistical Software, like R, for more advanced statistical computing (linear and nonlinear modeling, classic statistical tests, time-series analysis, classification, clustering) and graphics. You can download it here for free.

  • Probability Distribution Calculators such as the Normal distribution calculator, the Binomial distribution calculator, and the Poisson distribution calculator. Lastly, check out this Bayes Rule Calculator for updating your credence for yes/no questions given new information.

  • HASH: System modeling software can generate and inform forecasts of complex systems. HASH can be used to represent complex systems and run "what-if " scenarios, to hone your intuitions and improve your predictions.

Tutorials, textbooks and other resources

  • Join the Social Science Prediction Platform, which supports the "systematic collection and assessment of expert forecasts of the effects of untested social programs." It is designed to assist policy makers and social scientists by improving the accuracy of forecasts, thereby leading to more effective decision-making and improvements to experimental design and analysis.

  • Play Calibrate Your Judgment, an interactive calibration tutorial produced by the OpenPhilanthropy Project. This is perhaps the most useful free online calibration training currently available. Note that you must sign in with a GuidedTrack, Facebook, or Google account, so that the application can track your performance over time.
  • AI Impact's Evidence on good forecasting practices from the Good Judgment Project summarizes the findings of the Good Judgment Project, the winning team in IARPA’s 2011-2015 forecasting tournament. The article describes the various correlates of successful forecasting as well as the heuristics, forecasting methodologies, philosophical outlooks, thinking styles that were associated with better predictions. Furthermore, it includes a helpful "recipe " for making predictions that describes how superforecasters (top 0.2% of forecasters) go about making their predictions.
  • Open Textbooks on Forecasting and Related Courses by Francis Diebold, and especially his Time-Series Econometrics: Forecasting , which provides an upper-level undergraduate / masters-level introduction to forecasting, broadly defined to include all aspects of predictive modeling, in economics and related fields. Having used this book for my macroeconometrics course, I highly recommend this book especially for the modeling of autoregressive processes for making point and density forecasts (which are especially useful to numeric-range predictions on Metaculus).

    The topics covered include: regression from a predictive viewpoint; conditional expectations vs. linear projections; decision environment and loss function; the forecast object, statement, horizon and information set; the parsimony principle, relationships among point, interval and density forecasts, and much more. The book can be found here, and the lecture slides covering material in the book can be found here. Diebold's resources are licensed under Creative Commons.

Research on forecasting

Below is a small selection from the extensive research literature on forecasting.

Tips on how to become a better predictor

  • Avoid overconfidence. Overconfidence is a common finding in the forecasting research literature, and is found to be present in a 2016 analysis of Metaculus predictions. Overconfidence comes in many forms, such as overconfidence in intuitive judgements, explicit models, or (your or other's) domain-specific expertise. Generally overconfidence leads people to:

    1. neglect decision aids or other assistance, thereby increasing the likelihood of a poor decision. In experimental studies of postdiction in which each were provided decision aids, subject-level expertise (and thereby confidence) was found to be correlated with lower levels of use of reliable decision aids, and worse predictions overall.

    2. make predictions contrary to the base rate. The base rate is the prevalence of a condition in the population under investigation. To expect the future to be substantially different from the past, one must have good evidence that i) some process crucial to bringing the usual result about will fail, and ii) the replacement process will produce a different outcome. Bayes rule teaches us that to predict unlikely events we must have highly diagnostic information (information that you'd be unlikely to observe in the usual case) whilst often predictors rely on their confidence rather than diagnosticity of evidence in going against the base rate.

    To counteract overconfidence forecasters should heed five principles: (1) Consider alternatives, especially in novel or unprecedented situations for which data is lacking; (2) List reasons why the forecast might be wrong; (3) In group interaction, appoint a devil’s advocate (or play the devil's advocate in the comment section!); (4) Obtain feedback about predictions (by posting it in the comments for example); (5) Treat the feedback you receive as valuable information.

  • Break seemingly intractable problems into tractable sub-problems. This is Fermi-style thinking. Enrico Fermi designed the first atomic reactor. When he wasn’t doing that he loved to tackle challenging questions such as “How many piano tuners are in Chicago?” At first glance, this seems very difficult. Fermi started by decomposing the problem into smaller parts and putting them into the buckets of knowable and unknowable. By working at a problem this way you expose what you don’t know or, as Tetlock (2016) puts it, you “flush ignorance into the open.”

  • Discover the relevant base rate. A Metaculus time lord knows that there is nothing truly new under the sun. So, the best of forecasters often conduct creative searches for comparison classes even for seemingly unique events and pose the question: How often do things of this sort happen in situations of this sort? Identify comparison classes for events, and let your predictions be informed by the base-rate of occurrence in this class of events. This is often easier and more effective then it is to understand the event's working from first-principles.

  • Combine systematic ‘model-thinking’ approach with an intuition-based approach. Whilst it might be often good to use systematic ‘model-thinking’ approach that uses explicit theoretical or statistical reasoning, you should generally also use an intuition-based approach to predicting. When these two approaches yield different answers, think carefully about whether your question is the type of question that is better answered with intuitive judgments or with systematic modelling, and combine the two answers accordingly to inform your prediction. According to Kahneman, intuitive judgements about some subject likely to be accurate only when the following three conditions hold:

      • The relevant subject exhibits a large degree of regularity
      • One has had sufficient amount of exposure to this subject to have been able to pick up the relevant regularities
      • One has received enough feedback to evaluate previous intuitive judgments

  • Look for the errors behind your mistakes. It’s easy to justify or rationalize your failure. Don’t. Own it and evaluate your track record (both resolution and calibration) and compare this the community track record. You want to learn where you went wrong and determine ways to get better. And don’t just look at failures. Evaluate successes as well so you can determine whether you used reliable techniques for producing forecasts or whether you were just plain lucky. For example, if you have an average log-score above 0.2, this might be evidence of overconfidence; in which case you should follow the tips on counteracting overconfidence presented above.

  • Share your work in the question's comments section. Sharing your theoretical reasoning (such as posting your Guesstimate model), statistical reasoning, information/data sources, or dependencies with others is good practice not just because you’re providing a valuable public good for our understanding of the future, but also because others may supplement your work with additional insight.

Data Sources

General Data Sources (in no particular order)

Data Service Organization Topics Size Ease of Use Comments
Public Data Explorer Google All topics Very large

Public Data Explorer aggregates public data from
113 dataset providers (such as international organizations, national
statistical offices, non-governmental organizations, and research institutions)
Very Easy

This is a good place to start with your search
for data, since many datasets are available which are often straightforward to
find. There are sometimes also great visualizations
This is perhaps the best place to look for public data and forecasts provided from third-party data providers

Highly recommended also is the International Futures Forecasting Data on long-term forecasting and global trend analysis available on the Public Data Explorer
Our World in Data The Oxford Martin Programme on Global
Development at the University of Oxford
Global living conditions: Health, Food Provision, The Growth and Distribution of Incomes, Violence, Rights, Wars, Culture, Energy Use, Education, and Environmental Changes Small

Our World in Data aggregates some hundreds of datasets, all of which are organized well and given appropriate context
Very Easy

There are excellent visualizations. Each topic the quality of the data is discussed and, by pointing the visitor to the sources, this website is also a database of databases. Covering all of these aspects in one resource makes it possible to understand how the observed long-run trends are interlinked
Highly recommended for big picture questions about the human condition
Data.gov Various branches of the U.S. Government Agriculture, Climate, Consumer, Education, Energy, Finance, Health, Manufacturing, Public Safety, Science and Research Very Large

Over 285,000 datasets from most federal departments, city governments, universities, NGOs and the private sector.
Moderately difficult

You do need to enter in good search queries to get a short list of relevant results.
You can really find data on almost anything
The World Bank Open Data The World Bank Agriculture & Rural Development, Aid Effectiveness, Climate Change, Economy & Growth, Education, Energy & Mining, Environment, Financial Sector, Gender, Health, Infrastructure, Poverty, Science & Technology, Social Development, Trade, Urban Development Large

17,445 Datasets available
Easy Their datasets on Science & Technology might especially relevant for Metaculus questions
UNData United Nations Statistics Division Agriculture, Crime, Education, Employment, Energy, Environment, Health, HIV/AIDS, Human Development, Industry, Information and Communication Technology, National Accounts, Population, Refugees, Tourism, Trade, as well as the Millennium Development Goals indicators Large Very Easy Very intuitive interface for dataset searching
Global Health Observatory Data Repository The World Health Organization Health-related topics Moderately large

1000 indicators for its 194 member states
Easy

You can browse this data by theme, category, or indicator
Excellent for health-related questions,such as those involving pandemics, antimicrobial resistance, and malaria
OECDstat Organisation for Economic Co-operation and Development (OECD) Technology and Patents, Development, Environment, Globalisation, Finance, Health, Industry, Information and Communication Technology, Productivity, Social Protection and Wellbeing,Transport, and more Very easy

Their online statistical database permits google-like keyword search

Macroeconomic & Financial Only Data Sources (in no particular order)

Data Service Organization Topics Size Ease of Use Comments
Bureau of Economic Analysis U.S. Department of Commerce Official macroeconomic and industry statistics, most notably reports about the gross domestic product (GDP) of the United States, as well as personal income, corporate profits and government spending Large Easy
Yahoo Finance Yahoo Financial news, data and commentary including stock quotes, press releases, financial reports Very Large Very Easy Here's the S&P 500
Economic Research at the St. Louis Fed St. Louis Fed Money & Banking, Population, Employment, Production, Prices, International Data, Academic data (including the NBER Macrohistory database) Very Large

509,000 US and international time series from 87 sources
Very Easy

Check out their categories for a breakdown of their datasets