crowdsourcing probable understanding delivering contingent wisdom exploring accurate insights exploring definitive estimations computing predictive forecasts mapping the future forecasting probable understanding computing probable futures calculating calibrated understanding composing probable contingencies assembling quantitative insights assembling intelligent contingencies assembling calibrated estimations aggregating intelligent insights

Prediction Resources

In order to be the best possible predictor — to climb to the top of the rankings and establish yourself as a Metaculus Time Lord — you'll need to gather as much data as possible, filter out all the noise, and distill it down to a series of insightful projections that can pinpoint the future with laser-like accuracy. It's not an easy task, but by making smart use of the resources presented here you should be well on your way.

Please note that this page is a continual work in progress! If you have a useful resource to add, let us know in the associated discussion comments or send us a note at support@metaculus.com.

Table of Contents

Analysis Tools
Textbooks
Data Sources
General Advice

Analysis Tools

  • Guesstimate: a simple web-based tool to model uncertainties in calculations. Guesstimate's interface is similar to other spreadsheet tools, such as Excel or Google Sheets. Each model is a grid of cells, and each cell can be filled with a name and value. Functions can be used to connect cells together to represent more complex quantities.

    For example, consider the question series about the Fermi paradox. We may use the Drake equation (a "back of the envelope" estimation to find out if there is intelligent life in the Milky Way other than us humans) to estimate the number of intelligent civilizations in our milky verse based on 7 different variables (see drake equation). Each guess has its own uncertainties, and with Guesstimate you can multiply the guesses and their uncertainties together to get a probability distribution of the number of intelligent civilizations. See the following model by a Guesstimate user on this probability. Also check out public models, and don't forget to post your models in the comments of questions for others to see!

  • Excel for both theoretical modelling and basic statistical analysis. Excel offers similar options to Guesstimate, as you can create theoretical models to factorize questions, produce estimates for subquestions, and run basic Monte Carlo simulations (see here for an example of such simulation). Secondly, basic statistical analysis (descriptive statistics, correlations, regressions and so on) is convenient in Excel (see here for more information).

  • Statistical Software, like R, for more advanced statistical computing (linear and nonlinear modeling, classic statistical tests, time-series analysis, classification, clustering) and graphics. You can download it here for free.

  • Probability Distribution Calculators such as the Normal distribution calculator, the Binomial distribution calculator, and the Poisson distribution calculator. See also our (upcoming) page on common probability distributions for an interactive tutorial on using distributions to make predictions. Lastly, check out this Bayes Rule Calculator for updating your credence for yes/no questions given new information.

Textbooks

Data Sources

General Data Sources (in no particular order)
Data Service Organization Topics Size Ease of Use Comments
Public Data Explorer Google All topics Very large

Public Data Explorer aggregates public data from
113 dataset providers (such as international organizations, national
statistical offices, non-governmental organizations, and research institutions)
Very Easy

This is a good place to start with your search
for data, since many datasets are available which are often straightforward to
find. There are sometimes also great visualizations
This is perhaps the best place to look for public data and forecasts provided from third-party data providers

Highly recommended also is the International Futures Forecasting Data on long-term forecasting and global trend analysis available on the Public Data Explorer
Our World in Data The Oxford Martin Programme on Global
Development at the University of Oxford
Global living conditions: Health, Food Provision, The Growth and Distribution of Incomes, Violence, Rights, Wars, Culture, Energy Use, Education, and Environmental Changes Small

Our World in Data aggregates some hundreds of datasets, all of which are organized well and given appropriate context
Very Easy

There are excellent visualizations. Each topic the quality of the data is discussed and, by pointing the visitor to the sources, this website is also a database of databases. Covering all of these aspects in one resource makes it possible to understand how the observed long-run trends are interlinked
Highly recommended for big picture questions about the human condition
Data.gov Various branches of the U.S. Government Agriculture, Climate, Consumer, Education, Energy, Finance, Health, Manufacturing, Public Safety, Science and Research Very Large

Over 285,000 datasets from most federal departments, city governments, universities, NGOs and the private sector.
Moderately difficult

You do need to enter in good search queries to get a short list of relevant results.
You can really find data on almost anything
The World Bank Open Data The World Bank Agriculture & Rural Development, Aid Effectiveness, Climate Change, Economy & Growth, Education, Energy & Mining, Environment, Financial Sector, Gender, Health, Infrastructure, Poverty, Science & Technology, Social Development, Trade, Urban Development Large

17,445 Datasets available
Easy Their datasets on Science & Technology might especially relevant for Metaculus questions
UNData United Nations Statistics Division Agriculture, Crime, Education, Employment, Energy, Environment, Health, HIV/AIDS, Human Development, Industry, Information and Communication Technology, National Accounts, Population, Refugees, Tourism, Trade, as well as the Millennium Development Goals indicators Large Very Easy Very intuitive interface for dataset searching
Global Health Observatory Data Repository The World Health Organization Health-related topics Moderately large

1000 indicators for its 194 member states
Easy

You can browse this data by theme, category, or indicator
Excellent for health-related questions,such as those involving pandemics, antimicrobial resistance, and malaria
OECDstat Organisation for Economic Co-operation and Development (OECD) Technology and Patents, Development, Environment, Globalisation, Finance, Health, Industry, Information and Communication Technology, Productivity, Social Protection and Wellbeing,Transport, and more Very easy

Their online statistical database permits google-like keyword search
Macroeconomic & Financial Only Data Sources (in no particular order)
Data Service Organization Topics Size Ease of Use Comments
Bureau of Economic Analysis U.S. Department of Commerce Official macroeconomic and industry statistics, most notably reports about the gross domestic product (GDP) of the United States, as well as personal income, corporate profits and government spending Large Easy
Yahoo Finance Yahoo Financial news, data and commentary including stock quotes, press releases, financial reports Very Large Very Easy Here's the S&P 500
Economic Research at the St. Louis Fed St. Louis Fed Money & Banking, Population, Employment, Production, Prices, International Data, Academic data (including the NBER Macrohistory database) Very Large

509,000 US and international time series from 87 sources
Very Easy

Check out their categories for a breakdown of their datasets

General Advice

  • Combine systematic ‘model-thinking’ approach with an intuition-based approach. Whilst it might be often good to use systematic ‘model-thinking’ approach that uses explicit theoretical or statistical reasoning, you should generally also use an intuition-based approach to predicting. When these two approaches yield different answers, think carefully about whether your question is the type of question that is better answered with intuitive judgments or with systematic modelling, and combine the two answers accordingly to inform your prediction. For more information see for example this article on how to adjudicate between intuitive judgments and those produced by explicit reasoning.

  • Avoid overconfidence in explicit models. Overconfidence is a common finding in the forecasting research literature, and is found to be present in a 2016 analysis of Metaculus predictions.

    Those using quantitative models produce overconfident forecasts because the models often overlook key sources of uncertainty. For example, measures for uncertainty typically do not account for the uncertainty in the forecasts of the causal variables in an econometric model.

    Generally overconfidence leads people to:

    (1) neglect decision aids or other assistance, thereby increasing the likelihood of a poor decision. In experimental studies of postdiction in which each were provided decision aids, subject-level expertise (and thereby confidence) was found to be correlated with lower levels of use of reliable decision aids, and worse predictions overall (see here).

    (2) make predictions contrary to the base rate. The base rate is the prevalence of a condition in the population under investigation. Bayes theorem teaches us that to predict unlikely events we must have highly diagnostic information, whilst often predictors rely on their confidence rather than diagnosticity of evidence in going against the base rate.

    To counteract overconfidence forecasters should heed five principles: (1) Consider alternatives, especially in novel or unprecedented situations for which data is lacking; (2) List reasons why the forecast might be wrong; (3) In group interaction, appoint a devil’s advocate (or play the devil's advocate in the comment section!); (4) Make an explicit prediction (i.e. post your prediction in the comments) and then obtain feedback; (5) Treat the feedback you receive as valuable information.

  • Share your work in the question's comments section. Sharing your theoretical reasoning (such as posting your Guesstimate model), statistical reasoning, information/data sources, or dependencies with others is good practice not just because you’re providing a valuable public good for our understanding of the future, but also because others may supplement your work with additional insight.