The list of questions relevant to this contest is here. Once you submit your essay, it will be available for judges to review and will no longer be able to edit it. Please make sure to review the eligibility criteria before submitting. Thank you!
This content now needs to be approved by community moderators.
This essay was submitted and is waiting for review.
This essay was submitted to the AI Progress Essay Contest, an initiative that focused on the timing and impact of transformative artificial intelligence. You can read the results of the contest and the winning essays here.
This essay will explore the medium-term future of artificial intelligence, focusing on what forecasts in Metaculus imply about timelines on AGI and AI-safety research.
While I tried to base this essay mostly on Metaculus' forecasts, many parts are influenced by my priors on the discussed topics. In a very simplified, unnuanced manner, the main assumptions are the following:
AI safety, broadly defined, is an area of research dedicated to ensuring that AI deployment does not result in undesirable outcomes. The alignment problem in particular deals with how to construct AI systems whose goals and values are aligned with what we really want them to do. This is a very difficult problem, and it gets much harder with the intelligence of the system. A key insight into the severity of this issue is that, specially for non-evolved systems, intelligence and goals are orthogonal from each other. The space of possible minds is huge, we want to hit a vanishingly narrow and imprecise range in it, and we don't even know how to aim yet.
A naive answer to the alignment problem would be to try directly specifying a set of desirable objectives/conditions, rewarding the system when it meets the goal and punishing it otherwise. But objective specification is hard, even in simple environments. A reinforcement learner will naturally find solutions that optimize exactly on what it was asked to optimize. A similar phenomenon, but much more familiar and bounded, also occurs in humans: Goodhart's law.
Examples of specification gaming can be fun when the objective of high average speed is satisfied by evolving tall creatures that fall over, or when undesirable events are only avoided in the sense that the proxy sensory data is kept blind to the penalized outcomes. But these types of behavior become less humorous, and frighteningly worrisome, when considering similar failure modes on hypothetical systems that are much more capable, specially at the human level or higher. For a powerful enough AI, objective specification can be seen as being at least as hard as safely asking a wish to a completely amoral, alien genie.
So, does the whole alignment problem reduces to finding a good enough specification? Not really: see for example the topics of mesa-optimization and distributional shift (see also Section 7 of this paper). Furthermore, for a powerful enough AI system, a 'good enough' specification would basically need to include/extrapolate a description of something like 'all human values and morality'. That seems hard enough, but gets worse considering that human morals are based on our particular ontological and epistemic representations of reality, which may diverge wildly from those learned by an AI. Furthermore, human ethics are mostly based on finite intuitions, even though we probably live in an infinite universe of some type or another; that is quite a distributional shift, one that we ourselves don't know how to deal with. Also, the ethical intuitions of humans are not necessarily 'good', or coherent, or widely-shared, and locking ourselves into current humanity's ethical intuitions could also be potentially disastrous.
So far, the Metaculus community is not hopeful of the alignment problem, or other control methods, being solved before the first AGI.
What are the risks of deploying an unaligned, generally human-level AI? If there is enough hardware overhang, even an initially slow AGI system could have huge impacts by its capability of being copied massively or being sped-up. Furthermore, costs, efficiency, and capabilities will probably improve quickly after the first AGI is publically presented.
Is having many human-level AIs a problem, even if they are unaligned? After all, human society already deals regularly with harmful human-level intelligences, namely criminal humans via law enforcement. But this analogy totally underestimates the otherness that a de novo AI will probably have. Even one based on artificial neural networks, and trained to imitate humans, will probably have less in common with a human than a human has with an arthropod, or with other creatures that are the product of biological evolution. At the very least, one can imagine entities that are as good as a human at reasoning and planning, which can reproduce extremely fast over digital substrates, and which have a completely alien set of values.
But the real worry is that, once we have artificial intelligence that is roughly at the level of humans, it won't be long before we have superhuman intelligence: systems that can surpass any individual human, and perhaps all humanity as a whole, in all intellectual tasks. Given the impressive cognitive accomplishments of some mere humans, it seems very unlikely that even a single, unaligned superintelligent AI won't eventually result in terrible results for humanity, even if considerable efforts are made to contain it.
This analysis from November 2021, harnessing Metaculus predictions, indicates that AI is considered the most likely cause for a hypothetical near/complete extinction this century, with a total probability for that scenario of .
In the approximately four and a half months since that analysis was made until April 10th 2022, the community's underlying risk forecast of global catastrophe this century has risen from 20% to 30%. On the other hand, the probability of an AI catastrophe, conditional on some global catastrophe, has moved from ~25% to ~20%, and the probability (conditional on an AI catastrophe) of extinction scenarios from AI has mostly fluctuated around 67%.
And, although 4% chance of extinction this century is certainly not small, other bad outcomes are possible if we create superintelligent AIs which do not share our values and that we can't control.
Progress in machine learning has been very impressive in the last years. Existing systems already seem to have the potential of having a large economic and scientific impact (text-to-image generation, protein folding prediction, code generation, robotics, et cetera). It is possible that we are not far away from transformative AI, or even a strong form of it: having AI systems capable of automating or considerably speeding up scientific and technological advancement. Even if such technologies do not initially require human-level AI, it seems very probable that they would quickly facilitate the development of human or superhuman level AI.
So, what are the community predictions regarding the arrival of AGI? The following question technically asks about an event that is posterior to the actual creation of the AGI. But, from the community median in this question, the difference is not very relevant at a timescale of years.
As of April 10th, the community puts ~50% of the probability mass on a system achieving all these criteria by 2037, and more than 25% on it happening before 2030.
If AGI is developed, what would happen next? How long between AGI and the first superintelligent AI?
These numbers indicate very grim perspectives for those who plan to delay alignment research until the first AGI system has been created. The window of available time between the first AGI and an overwhelmingly intelligent unaligned system would be too narrow.
In October 2015, Deepmind's AlphaGo achieved the first victory against a professional go player, Fan Hui, by 5-0. Before this achievement, go programs were only capable of playing at the level of human amateurs, and professional-level play still seemed to be far away.
This match was not disclosed to the public until 27 January 2016. Soon after that, the community prediction for the following question climbed from 29% to 90% (the reason for not jumping higher could be that the question asked for the game to be played in 2016, and in an official setting).
In 2020, Deepmind introduced AlphaFold, which basically solved the protein folding prediction problem: determining within a certain margin of error the shape that proteins will fold into. AlphaFold achieves a median performance on this problem that is comparable to what was achievable by modern, costly, and slow experimental methods. AlphaFold's achievement probably came faster than was expected by most: near the end of 2020, the Metaculus community assigned an 80% chance to something like this happening before 2031, which naively corresponds to a uniform annual probability of . After the impressive AlphaFold results were announced, the community prediction promptly jumped to 99%.
In February 2019, OpenAI presented GPT-2, a language model that was able to complete an arbitrary input by generating somewhat realistic and coherent text. The outputs sometimes seemed meandering and distracted, or they contained failures of real-world modeling, but they were very impressive nonetheless, and could seem almost human-like under cursory reading. In May 2020, GPT-3 was formally introduced, achieving a much higher text quality.
There is a question in Metaculus that tries to operationalize how surprising/sudden will AI progress be. Around the introduction of GPT-3, this question median went from about 62% to 68%, and it has stayed over 68% ever since.
Relatedly and remarkably, perhaps due to a quick succession of impressive news, during the writing of this essay, the two main questions that were previously mentioned with regard to AGI timelines received considerable updates downwards in their forecasts. The median time for the first AGI has gone down by 6 years (2043 to 2037), while the median time for the transition from AGI to ASI has gone down by more than a month (from 7.15 to 5.95).
Given the current pace of growth for AI capabilities, rapidly increasing investments in the area, and the precedents of unexpected capability spikes, a fundamental question is whether humanity's efforts for AI alignment (or, more generally, for solving the control problem) will manage to find solutions in time.
A simple measure of the effort being spent in a research field is counting the number of papers being published:
Beyond considerations about keyword drift and how representative arXiv e-prints are of the number of publications in the area, it is important not to mistake the proxy for what we actually want to measure: progress in AI-safety research. Indeed, not all AI-safety research is being published, and the proportion that is not published could increase in the future: concerns about the risks of disclosure could become more common, and it might become more widely accepted to focus on results over other considerations that are more typical in academic careers.
But, without interpreting the graphs with optimistic assumptions, the eventual deceleration that can be inferred from these two forecasts is not encouraging.
Now, given that the number of papers is not a very reliable proxy for advancement or interest in the area, another possibility is to observe the amount of investment that it is receiving. Metaculus currently does not have many questions of this type, but there is a pair that asks about a particular grantmaker.
In any case, it would seem that funds are not currently the main constraint for progress in AI-risk reduction, and that there are many other, more important bottlenecks. Having more of these forecasting questions, asking about other indicators of progress and attention in the area, would be useful to form a more complete understanding of the expected trajectories of AI-safety research in the upcoming years.
Currently, the Metaculus community does not expect the control problem to be credibly solved before the first AGI, and expects that the transition time from AGI to superintelligence will be a short one. In contrast, the indicators of attention/resources dedicated to AI-safety research do not seem to be growing nearly as fast as the general progress and investment into AI capabilities.
At the moment, Metaculus does not have many questions about the expected effects that different interventions or research would have on reducing AI risks. Having more forecasts distinguishing outcomes conditional on different scenarios could be very informative when trying to understand which actions may get us closer to workable solutions or temporary mitigations for AI risk. But, for now, it seems like current trajectories have a worryingly high probability of concluding in very undesirable outcomes for humanity. And the timelines keep getting shorter.
Summary and final remarks
Once you submit your essay, you can no longer edit it.