Your submission is now in Draft mode.

Once it's ready, please submit your draft for review by our team of Community Moderators. Thank you!

Submit Essay

Once you submit your essay, you can no longer edit it.


This content now needs to be approved by community moderators.


This essay was submitted and is waiting for review.

SOTA on Montezuma's Revenge: 2023-02-14


Reinforcement learning is a type of machine learning which focuses on methods that enable agents to learn to maximize some posited conception of cumulative reward. It has been become a core method of AI and machine learning research and practice. Atari games have been a long-standing benchmark in the reinforcement learning (RL) community for the past decade.

At the time of writing this question, the model Go-Explore (Ecoffet et al., 2020) has achieved the highest score at 43,791 without augmentation with domain knowledge. Although this exceeds the average human performance, it's still much below the human world record of 1,342,100

An excellent reference for tracking state-of-the-art models is PapersWithCode, which tracks performance data of ML models.

What will the highest score of any ML model that is un-augmented with domain knowledge on Atari 2600 Montezuma's Revenge be on 2023-02-14?

This question resolves as the highest score achieved by any model that does not harness any game-specific domain knowledge on Atari 2600 Montezuma's Revenge on 2023-02-14.

Performance figures may be taken from e-prints, conference papers, peer-reviewed articles, and blog articles by reputable AI labs (including the associated code repositories). Published performance figures must be available before 2023-02-14, 11:59PM GMT to qualify.

Domain knowledge include the position of the agent, details about the room numbers, level numbers, and knowledge about the location of keys (see e.g. Ecoffet et al., 2020).

In case the relevant performance figure is given as a confidence interval, the median value will be used to resolve the question.

Make a Prediction


Note: this question resolved before its original close time. All of your predictions came after the resolution, so you did not gain (or lose) any points for it.

Note: this question resolved before its original close time. You earned points up until the question resolution, but not afterwards.

Current points depend on your prediction, the community's prediction, and the result. Your total earned points are averaged over the lifetime of the question, so predict early to get as many points as possible! See the FAQ.