In order to be the best possible predictor — to climb to the top of the rankings and establish yourself as a Metaculus Time Lord — you'll need to gather as much data as possible, filter out all the noise, and distill it down to a series of insightful projections that can pinpoint the future with laser-like accuracy. It's not an easy task, but by making smart use of the resources presented here you should be well on your way.
Table of contentsAnalysis tools Tutorials, textbooks and other resources Tips on how to become a better predictor Data sources
Guesstimate: a simple web-based tool to model uncertainties in calculations. Guesstimate's interface is similar to other spreadsheet tools, such as Excel or Google Sheets. Each model is a grid of cells, and each cell can be filled with a name and value. Functions can be used to connect cells together to represent more complex quantities.
For example, consider the question series about the Fermi paradox. We may use the Drake equation (a "back of the envelope" estimation to find out if there is intelligent life in the Milky Way other than us humans) to estimate the number of intelligent civilizations in our milky verse based on 7 different variables (see drake equation). Each guess has its own uncertainties, and with Guesstimate you can multiply the guesses and their uncertainties together to get a probability distribution of the number of intelligent civilizations. See the following model by a Guesstimate user on this probability. Also check out public models, and don't forget to post your models in the comments of questions for others to see!
- Spreadsheets such as Excel or Google Sheets for both theoretical modelling and basic statistical analysis. Spreadsheets offer similar options to Guesstimate, as you can create theoretical models to factorize questions, produce estimates for subquestions, and run basic Monte Carlo simulations (see here for an example of such simulation). Secondly, basic statistical analysis (descriptive statistics, correlations, regressions and so on) is convenient in Excel (see here for more information). Finally, spreadsheets created on Google Sheets can also be shared in the comments, to allow others to view your work.
- Statistical Software, like R, for more advanced statistical computing (linear and nonlinear modeling, classic statistical tests, time-series analysis, classification, clustering) and graphics. You can download it here for free.
- Probability Distribution Calculators such as the Normal distribution calculator, the Binomial distribution calculator, and the Poisson distribution calculator. See also our (upcoming) page on common probability distributions for an interactive tutorial on using distributions to make predictions. Lastly, check out this Bayes Rule Calculator for updating your credence for yes/no questions given new information.
Tutorials, textbooks and other resources
- Join Replication Markets, a contest where users forecast whether various studies will replicate. This is a great way to hone your prediction skills, to further behavioral and social scientific knowledge, and to maybe even take home some of the $100,000 in prize money.
- Play Calibrate Your Judgment, an interactive calibration tutorial produced by the OpenPhilantropy Project. This is perhaps the most useful free online calibration training currently available. Note that you must sign in with a GuidedTrack, Facebook, or Google account, so that the application can track your performance over time.
- AI Impact's Evidence on good forecasting practices from the Good Judgment Project summarises the findings of the Good Judgment Project, the winning team in IARPA’s 2011-2015 forecasting tournament. The article describes the various correlates of successful forecasting as well as the heuristics, forecasting methodologies, philosophical outlooks, thinking styles that were associated with better predictions. Furthermore, it includes a helpful "recipe" for making predictions that describes how superforcasters (top 0.2% of forecasters) go about making their predictions.
Forecasting: Principles and Practice provides a comprehensive introduction to forecasting methods and present enough information about each method for readers to use them sensibly. The book is easy to read, is concise and presumes only basic statistics knowledge.
The book presents key concepts of forecasting. From judgmental forecasting (which can be useful when you have no or few data) to simple/multiple regression, time series decomposition, exponential smoothing (ETS), and a few more advanced topics such as Neural Networks (all in R). The book is optimised for providing useful advice on the making of predictions, and does not attempt to give a thorough discussion of the theoretical details behind each method.
Open Textbooks on Forecasting and Related Courses by Francis Diebold, and especially his Time-Series Econometrics: Forecasting , which provides an upper-level undergraduate / masters-level introduction to forecasting, broadly defined to include all aspects of predictive modeling, in economics and related fields. Having used this book for my macroeconometrics course, I highly recommend this book especially for the modelling of autogressive processes for making point and density forecasts (which are especially useful to numeric-range predictions on Metaculus).
The topics covered include: regression from a predictive viewpoint; conditional expectations vs. linear projections; decision environment and loss function; the forecast object, statement, horizon and information set; the parsimony principle, relationships among point, interval and density forecasts, and much more. The book can be found here, and the lecture slides covering material in the book can be found here. Diebold's resources are licensed under Creative Commons.
Tips on how to become a better predictor
Avoid overconfidence. Overconfidence is a common finding in the forecasting research literature, and is found to be present in a 2016 analysis of Metaculus predictions. Overconfidence comes in many forms, such as overconfidence in intuitive judgements, explicit models, or (your or other's) domain-specific expertise.
Generally overconfidence leads people to:
- neglect decision aids or other assistance, thereby increasing the likelihood of a poor decision. In experimental studies of postdiction in which each were provided decision aids, subject-level expertise (and thereby confidence) was found to be correlated with lower levels of use of reliable decision aids, and worse predictions overall.
- make predictions contrary to the base rate. The base rate is the prevalence of a condition in the population under investigation. To expect the future to be substantially different from the past, one must have good evidence that i) some process crucial to bringing the usual result about will fail, and ii) the replacement process will produce a different outcome. Bayes rule teaches us that to predict unlikely events we must have highly diagnostic information (information that you'd be unlikely to observe in the usual case) whilst often predictors rely on their confidence rather than diagnosticity of evidence in going against the base rate.
To counteract overconfidence forecasters should heed five principles: (1) Consider alternatives, especially in novel or unprecedented situations for which data is lacking; (2) List reasons why the forecast might be wrong; (3) In group interaction, appoint a devil’s advocate (or play the devil's advocate in the comment section!); (4) Obtain feedback about predictions (by posting it in the comments for example); (5) Treat the feedback you receive as valuable information.
- Break seemingly intractable problems into tractable sub-problems. This is Fermi-style thinking. Enrico Fermi designed the first atomic reactor. When he wasn’t doing that he loved to tackle challenging questions such as “How many piano tuners are in Chicago?” At first glance, this seems very difficult. Fermi started by decomposing the problem into smaller parts and putting them into the buckets of knowable and unknowable. By working at a problem this way you expose what you don’t know or, as Tetlock (2016) puts it, you “flush ignorance into the open.”
- Discover the relevant base rate. A Metaculus time lord knows that there is nothing truly new under the sun. So, the best of forecasters often conduct creative searches for comparison classes even for seemingly unique events and pose the question: How often do things of this sort happen in situations of this sort? Identify comparison classes for events, and let your predictions be informed by the base-rate of occurrence in this class of events. This is often easier and more effective then it is to understand the event's working from first-principles.
- Combine systematic ‘model-thinking’ approach with an intuition-based approach. Whilst it might be often good to use systematic ‘model-thinking’ approach that uses explicit theoretical or statistical reasoning, you should generally also use an intuition-based approach to predicting. When these two approaches yield different answers, think carefully about whether your question is the type of question that is better answered with intuitive judgments or with systematic modelling, and combine the two answers accordingly to inform your prediction. According to Kahneman, intuitive judgements about some subject likely to be accurate only when the following three conditions hold:
- The relevant subject exhibits a large degree of regularity
- One has had sufficient amount of exposure to this subject to have been able to pick up the relevant regularities
- One has received enough feedback to evaluate previous intuitive judgments
- Look for the errors behind your mistakes. It’s easy to justify or rationalize your failure. Don’t. Own it and evaluate your track record (both resolution and calibration) and compare this the community track record. You want to learn where you went wrong and determine ways to get better. And don’t just look at failures. Evaluate successes as well so you can determine whether you used reliable techniques for producing forecasts or whether you were just plain lucky. For example, if you have an average log-score above 0.2, this might be evidence of overconfidence; in which case you should follow the tips on counteracting overconfidence presented above.
- Share your work in the question's comments section. Sharing your theoretical reasoning (such as posting your Guesstimate model), statistical reasoning, information/data sources, or dependencies with others is good practice not just because you’re providing a valuable public good for our understanding of the future, but also because others may supplement your work with additional insight.
General Data Sources (in no particular order)
|Data Service||Organization||Topics||Size||Ease of Use||Comments|
|Public Data Explorer||All topics||Very large
Public Data Explorer aggregates public data from
113 dataset providers (such as international organizations, national
statistical offices, non-governmental organizations, and research institutions)
This is a good place to start with your search
for data, since many datasets are available which are often straightforward to
find. There are sometimes also great visualizations
|This is perhaps the best place to look for public data and forecasts provided from third-party data providers
Highly recommended also is the International Futures Forecasting Data on long-term forecasting and global trend analysis available on the Public Data Explorer
|Our World in Data||The Oxford Martin Programme on Global
Development at the University of Oxford
|Global living conditions: Health, Food Provision, The Growth and Distribution of Incomes, Violence, Rights, Wars, Culture, Energy Use, Education, and Environmental Changes||Small
Our World in Data aggregates some hundreds of datasets, all of which are organized well and given appropriate context
There are excellent visualizations. Each topic the quality of the data is discussed and, by pointing the visitor to the sources, this website is also a database of databases. Covering all of these aspects in one resource makes it possible to understand how the observed long-run trends are interlinked
|Highly recommended for big picture questions about the human condition|
|Data.gov||Various branches of the U.S. Government||Agriculture, Climate, Consumer, Education, Energy, Finance, Health, Manufacturing, Public Safety, Science and Research||Very Large
Over 285,000 datasets from most federal departments, city governments, universities, NGOs and the private sector.
You do need to enter in good search queries to get a short list of relevant results.
|You can really find data on almost anything|
|The World Bank Open Data||The World Bank||Agriculture & Rural Development, Aid Effectiveness, Climate Change, Economy & Growth, Education, Energy & Mining, Environment, Financial Sector, Gender, Health, Infrastructure, Poverty, Science & Technology, Social Development, Trade, Urban Development||Large
17,445 Datasets available
|Easy||Their datasets on Science & Technology might especially relevant for Metaculus questions|
|UNData||United Nations Statistics Division||Agriculture, Crime, Education, Employment, Energy, Environment, Health, HIV/AIDS, Human Development, Industry, Information and Communication Technology, National Accounts, Population, Refugees, Tourism, Trade, as well as the Millennium Development Goals indicators||Large||Very Easy||Very intuitive interface for dataset searching|
|Global Health Observatory Data Repository||The World Health Organization||Health-related topics||Moderately large
1000 indicators for its 194 member states
You can browse this data by theme, category, or indicator
|Excellent for health-related questions,such as those involving pandemics, antimicrobial resistance, and malaria|
|OECDstat||Organisation for Economic Co-operation and Development (OECD)||Technology and Patents, Development, Environment, Globalisation, Finance, Health, Industry, Information and Communication Technology, Productivity, Social Protection and Wellbeing,Transport, and more||Very easy
Their online statistical database permits google-like keyword search
Macroeconomic & Financial Only Data Sources (in no particular order)
|Data Service||Organization||Topics||Size||Ease of Use||Comments|
|Bureau of Economic Analysis||U.S. Department of Commerce||Official macroeconomic and industry statistics, most notably reports about the gross domestic product (GDP) of the United States, as well as personal income, corporate profits and government spending||Large||Easy|
|Yahoo Finance||Yahoo||Financial news, data and commentary including stock quotes, press releases, financial reports||Very Large||Very Easy||Here's the S&P 500|
|Economic Research at the St. Louis Fed||St. Louis Fed||Money & Banking, Population, Employment, Production, Prices, International Data, Academic data (including the NBER Macrohistory database)||Very Large
509,000 US and international time series from 87 sources
Check out their categories for a breakdown of their datasets