On May 31st, 2022, prominent deep learning skeptic and NYU professor emeritus Gary Marcus challenged Elon Musk to a bet on AGI by the end of 2029. His proposed bet consists of 5 AI achievements, of which he predicted no more than 2 would come to pass before 2030. This question is about Marcus' fourth prediction,
In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
Will an AI be able to reliably construct bug-free code of more than 10,000 lines before 2030?
This question resolves positively if before January 1st 2030, there is a public and credible demonstration of an AI writing code that clearly indicates the capability to do either of the following:
(1) Given a natural language description of a complex computer program comparable to the non-research related ideas found in this list of programming projects, the AI is able to write a computer program that satisfies the description to a satisfactory degree in at least 90.0% of cases. A computer program is said to have satisfied the conditions of a natural language description if there is a consensus among Metaculus admins that the code satisfies the conditions, without any major bugs. Minor bugs, such as the code occasionally crashing, will not disqualify any AI, as these are common even for professional human programmers.
(2) The AI is able to perform (1) when given the ability to interact with a non-expert user. A non-expert user is defined as someone who credibly reports not being able to write code that satisfies the conditions of these project ideas, but who is able to operate a computer well enough to understand whether a given computer program passes the requirements to a satisfactory degree.
Importantly, as per Marcus' constraint, we will not allow the AI to simply glue together code from existing libraries. It must generate code de novo, meaning that a plagiarism detector on par with the Copyleaks code plagiarism checker would not flag the code as definitively indicating cheating in more than 5% of cases.
We will use the best judgement of Metaculus administrators to resolve ambiguities in the conditions above.