Your submission is now in Draft mode.

Once it's ready, please submit your draft for review by our team of Community Moderators. Thank you!

Submit Essay

Once you submit your essay, you can no longer edit it.

Pending

This content now needs to be approved by community moderators.

Submitted

This essay was submitted and is waiting for review.

Date of Artificial General Intelligence

AI Progress Essay Contest

Question

This question is a duplicate of this one with a stronger operationalization for artificial general intelligence, and including robotic capabilities. I will copy relevant parts of that question to this one.

Since the inception of the field, the goal of Artificial Intelligence (AI) research has been to develop a machine-based system that can perform the same general-purpose reasoning and problem-solving tasks humans can. While computers have surpassed humans in many information-processing abilities, this "general" intelligence has remained elusive.

AI, and particularly machine learning (ML), is advancing rapidly, with previously human-specific tasks such as image and speech recognition, translation and even driving, now being successfully tackled by narrow AI systems.

But there is a stunning diversity of opinion about when general AI may arrive, according to published expert surveys. For example this study finds 50% of AI researchers accord a 50% probability to "High level machine intelligence" (HLMI) by 2040, while 20% say that 50% probability will not be reached until 2100 or later. Similarly, this survey finds an aggregated probability distribution with a 25%-75% confidence interval (comparable to Metaculus sliders below) ranging from 2040 to well past 2100.

It would be nice to tighten these probability intervals considerably, so we ask of the Metaculus community:

When will the first general AI system be devised, tested, and publicly known of?

We will thus define "an AI system" as a single unified software system that can satisfy the following criteria, all completable by at least some humans.

  • Able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An 'adversarial' Turing test is one in which the human judges are instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A single demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of Metaculus Admins.

  • Has general robotic capabilities, of the type able to autonomously, when equipped with appropriate actuators and when given human-readable instructions, satisfactorily assemble a (or the equivalent of a) circa-2021 Ferrari 312 T4 1:8 scale automobile model. A single demonstration of this ability, or a sufficiently similar demonstration, will be considered sufficient.

  • High competency at a diverse fields of expertise, as measured by achieving at least 75% accuracy in every task and 90% mean accuracy across all tasks in the Q&A dataset developed by Dan Hendrycks et al..

  • Able to get top-1 strict accuracy of at least 90.0% on interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al. Top-1 accuracy is distinguished, as in the paper, from top-k accuracy in which k outputs from the model are generated, and the best output is selected.

By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on a Q&A task, or verbally report its progress and identify objects during model assembly. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)

Resolution will come from any of three forms, whichever comes first: (1) direct demonstration of such a system achieving ALL of the above criteria, (2) confident credible statement by its developers that an existing system is able to satisfy these criteria, or (3) judgement by a majority vote in a special committee composed of the question author and two AI experts chosen in good faith by him, for the sole purpose of resolving this question. Resolution date will be the first date at which the system (subsequently judged to satisfy the criteria) and its capabilities are publicly described in a talk, press release, paper, or other report available to the general public.

[May 16, 2022 - casens: The resolution criteria have been clarified and edited. See the previous criteria in the fine print]

Resolution criteria prior to May 16, 2022:

We will thus define "an AI system" as a single unified software system that can satisfy the following criteria, all completable by at least some humans.

  • Able to reliably pass a Turing test of the type that would win the Loebner Gold Prize. The gold prize is reserved for, "the first bot that can pass an extended Turing Test involving textual, visual, and auditory components."

  • Has general robotic capabilities, of the type able to autonomously, when equipped with appropriate actuators, satisfactorily assemble a (or the equivalent of a) circa-2020 de Agostini 1:8 scale automobile model.

  • High competency at a diverse fields of expertise, as measured by achieving at least 75% accuracy in every task and 90% mean accuracy across all tasks in the Q&A dataset developed by Dan Hendrycks et al..

  • Be able to take a simple text description and turn it into a program coded in C/Python. In particular, we'll ask that in at least 9 out of 10 trials, the system can take the specification of a simple program from a list comparable to the "intermediate" section of this one, and output an executable C or Python code that does the assigned task.

By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on a Q&A task, or verbally report its progress and identify objects during model assembly. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)

Resolution will be by direct demonstration of such a system achieving the above criteria, or by confident credible statement by its developers that an existing system is able to satisfy these criteria. In case of contention as to whether a given system satisfies the resolution criteria, a ruling will be made by a majority vote of the question author and two AI experts chosen in good faith by him. Resolution date will be the first date at which the system (subsequently judged to satisfy the criteria) and its capabilities are publicly described in a talk, press release, paper, or other report available to the general public.

(Edited 2020-10-15 to strengthen programming task and weaken construction task.)

Make a Prediction

Prediction

Note: this question resolved before its original close time. All of your predictions came after the resolution, so you did not gain (or lose) any points for it.

Note: this question resolved before its original close time. You earned points up until the question resolution, but not afterwards.

Current points depend on your prediction, the community's prediction, and the result. Your total earned points are averaged over the lifetime of the question, so predict early to get as many points as possible! See the FAQ.