In order to be the best possible predictor — to climb to the top of the rankings and establish yourself as a Metaculus Time Lord — you'll need to gather as much data as possible, filter out all the noise, and distill it down to a series of insightful projections that can pinpoint the future with laser-like accuracy. It's not an easy task, but by making smart use of the resources presented here you should be well on your way.
Table of ContentsAnalysis Tools Textbooks Data Sources General Advice
Guesstimate: a simple web-based tool to model uncertainties in calculations. Guesstimate's interface is similar to other spreadsheet tools, such as Excel or Google Sheets. Each model is a grid of cells, and each cell can be filled with a name and value. Functions can be used to connect cells together to represent more complex quantities.
For example, consider the question series about the Fermi paradox. We may use the Drake equation (a "back of the envelope" estimation to find out if there is intelligent life in the Milky Way other than us humans) to estimate the number of intelligent civilizations in our milky verse based on 7 different variables (see drake equation). Each guess has its own uncertainties, and with Guesstimate you can multiply the guesses and their uncertainties together to get a probability distribution of the number of intelligent civilizations. See the following model by a Guesstimate user on this probability. Also check out public models, and don't forget to post your models in the comments of questions for others to see!
- Excel for both theoretical modelling and basic statistical analysis. Excel offers similar options to Guesstimate, as you can create theoretical models to factorize questions, produce estimates for subquestions, and run basic Monte Carlo simulations (see here for an example of such simulation). Secondly, basic statistical analysis (descriptive statistics, correlations, regressions and so on) is convenient in Excel (see here for more information).
- Statistical Software, like R, for more advanced statistical computing (linear and nonlinear modeling, classic statistical tests, time-series analysis, classification, clustering) and graphics. You can download it here for free.
- Probability Distribution Calculators such as the Normal distribution calculator, the Binomial distribution calculator, and the Poisson distribution calculator. See also our (upcoming) page on common probability distributions for an interactive tutorial on using distributions to make predictions. Lastly, check out this Bayes Rule Calculator for updating your credence for yes/no questions given new information.
Forecasting: Principles and Practice provides a comprehensive introduction to forecasting methods and present enough information about each method for readers to use them sensibly. The book is easy to read, is concise and presumes only basic statistics knowledge.
The book presents key concepts of forecasting. From judgmental forecasting (which can be useful when you have no or few data) to simple/multiple regression, time series decomposition, exponential smoothing (ETS), and a few more advanced topics such as Neural Networks (all in R). The book is optimised for providing useful advice on the making of predictions, and does not attempt to give a thorough discussion of the theoretical details behind each method.
General Data Sources (in no particular order)
|Data Service||Organization||Topics||Size||Ease of Use||Comments|
|Public Data Explorer||All topics||Very large
Public Data Explorer aggregates public data from
113 dataset providers (such as international organizations, national
statistical offices, non-governmental organizations, and research institutions)
This is a good place to start with your search
for data, since many datasets are available which are often straightforward to
find. There are sometimes also great visualizations
|This is perhaps the best place to look for public data and forecasts provided from third-party data providers
Highly recommended also is the International Futures Forecasting Data on long-term forecasting and global trend analysis available on the Public Data Explorer
|Our World in Data||The Oxford Martin Programme on Global
Development at the University of Oxford
|Global living conditions: Health, Food Provision, The Growth and Distribution of Incomes, Violence, Rights, Wars, Culture, Energy Use, Education, and Environmental Changes||Small
Our World in Data aggregates some hundreds of datasets, all of which are organized well and given appropriate context
There are excellent visualizations. Each topic the quality of the data is discussed and, by pointing the visitor to the sources, this website is also a database of databases. Covering all of these aspects in one resource makes it possible to understand how the observed long-run trends are interlinked
|Highly recommended for big picture questions about the human condition|
|Data.gov||Various branches of the U.S. Government||Agriculture, Climate, Consumer, Education, Energy, Finance, Health, Manufacturing, Public Safety, Science and Research||Very Large
Over 285,000 datasets from most federal departments, city governments, universities, NGOs and the private sector.
You do need to enter in good search queries to get a short list of relevant results.
|You can really find data on almost anything|
|The World Bank Open Data||The World Bank||Agriculture & Rural Development, Aid Effectiveness, Climate Change, Economy & Growth, Education, Energy & Mining, Environment, Financial Sector, Gender, Health, Infrastructure, Poverty, Science & Technology, Social Development, Trade, Urban Development||Large
17,445 Datasets available
|Easy||Their datasets on Science & Technology might especially relevant for Metaculus questions|
|UNData||United Nations Statistics Division||Agriculture, Crime, Education, Employment, Energy, Environment, Health, HIV/AIDS, Human Development, Industry, Information and Communication Technology, National Accounts, Population, Refugees, Tourism, Trade, as well as the Millennium Development Goals indicators||Large||Very Easy||Very intuitive interface for dataset searching|
|Global Health Observatory Data Repository||The World Health Organization||Health-related topics||Moderately large
1000 indicators for its 194 member states
You can browse this data by theme, category, or indicator
|Excellent for health-related questions,such as those involving pandemics, antimicrobial resistance, and malaria|
|OECDstat||Organisation for Economic Co-operation and Development (OECD)||Technology and Patents, Development, Environment, Globalisation, Finance, Health, Industry, Information and Communication Technology, Productivity, Social Protection and Wellbeing,Transport, and more||Very easy
Their online statistical database permits google-like keyword search
Macroeconomic & Financial Only Data Sources (in no particular order)
|Data Service||Organization||Topics||Size||Ease of Use||Comments|
|Bureau of Economic Analysis||U.S. Department of Commerce||Official macroeconomic and industry statistics, most notably reports about the gross domestic product (GDP) of the United States, as well as personal income, corporate profits and government spending||Large||Easy|
|Yahoo Finance||Yahoo||Financial news, data and commentary including stock quotes, press releases, financial reports||Very Large||Very Easy||Here's the S&P 500|
|Economic Research at the St. Louis Fed||St. Louis Fed||Money & Banking, Population, Employment, Production, Prices, International Data, Academic data (including the NBER Macrohistory database)||Very Large
509,000 US and international time series from 87 sources
Check out their categories for a breakdown of their datasets
- Combine systematic ‘model-thinking’ approach with an intuition-based approach. Whilst it might be often good to use systematic ‘model-thinking’ approach that uses explicit theoretical or statistical reasoning, you should generally also use an intuition-based approach to predicting. When these two approaches yield different answers, think carefully about whether your question is the type of question that is better answered with intuitive judgments or with systematic modelling, and combine the two answers accordingly to inform your prediction. For more information see for example this article on how to adjudicate between intuitive judgments and those produced by explicit reasoning.
Avoid overconfidence in explicit models. Overconfidence is a common finding in the forecasting research literature, and is found to be present in a 2016 analysis of Metaculus predictions.
Those using quantitative models produce overconfident forecasts because the models often overlook key sources of uncertainty. For example, measures for uncertainty typically do not account for the uncertainty in the forecasts of the causal variables in an econometric model.
Generally overconfidence leads people to:
(1) neglect decision aids or other assistance, thereby increasing the likelihood of a poor decision. In experimental studies of postdiction in which each were provided decision aids, subject-level expertise (and thereby confidence) was found to be correlated with lower levels of use of reliable decision aids, and worse predictions overall (see here).
(2) make predictions contrary to the base rate. The base rate is the prevalence of a condition in the population under investigation. Bayes theorem teaches us that to predict unlikely events we must have highly diagnostic information, whilst often predictors rely on their confidence rather than diagnosticity of evidence in going against the base rate.
To counteract overconfidence forecasters should heed five principles: (1) Consider alternatives, especially in novel or unprecedented situations for which data is lacking; (2) List reasons why the forecast might be wrong; (3) In group interaction, appoint a devil’s advocate (or play the devil's advocate in the comment section!); (4) Make an explicit prediction (i.e. post your prediction in the comments) and then obtain feedback; (5) Treat the feedback you receive as valuable information.
- Share your work in the question's comments section. Sharing your theoretical reasoning (such as posting your Guesstimate model), statistical reasoning, information/data sources, or dependencies with others is good practice not just because you’re providing a valuable public good for our understanding of the future, but also because others may supplement your work with additional insight.