Metaculus Help: Spread the word
If you like Metaculus, tell your friends! Share this question via Facebook, Twitter, or Reddit.
Will an AI system do credibly well on a full math SAT exam by 2025?
Humans have devised many ways of assessing other humans' intelligence, and forcing people to participate in such measures. University entrance exams are one of the most familiar, inflicted on countless high school students each year as standardized measures of academic competence and promise. Recently, these exams have begun the target of AI and machine learning projects.
According to a report by Engadget, Japan’s National Institute of Informatics had been working on an AI since 2011 with the final objective of passing the entrance exam for the University of Tokyo, tentatively by March 2022. However, a recent report has revealed that the institute will be terminating the project because of its AI's inability to fully understand the broad context of the entrance exam questions.
More recently, on September 21, 2015, the Allen Institute for Artificial Intelligence (AI2) announced in a paper that it created an AI system called GeoS that can solve SAT geometry questions "as well as the average 11th-grade American student." According to this story GeoS "uses a combination of computer vision to interpret diagrams, natural language processing to read and understand text, and a geometric solver to achieve 49 percent accuracy on geometry questions from the official SAT tests. If these results were extrapolated to the entire Math SAT test, the computer roughly achieved an SAT score of 500 (out of 800), the average test score for 2015." Although AI2 initially focused GeoS on solving plane geometry questions, it hopes to move to solve the full set of Math SAT questions by 2018.
This is not an easy feat; however it may be significantly more difficult to actually do decently well on such an exam, including all sections. We ask:
By end of 2025, will an AI system achieve the equivalent of 75th percentile on the full mathematics section of an SAT exam comparable to those circa 2015?
Resolution is by credible media report or published paper. The system must be given only page images, and trained on exams that do not include any questions from the scored test. Exams will count as long as the topics and difficulty is broadly comparable to the 2015 exams.
Metaculus help: Predicting
Predictions are the heart of Metaculus. Predicting is how you contribute to the wisdom of the crowd, and how you earn points and build up your personal Metaculus track record.
The basics of predicting are very simple: move the slider to best match the likelihood of the outcome, and click predict. You can predict as often as you want, and you're encouraged to change your mind when new information becomes available. With tachyons you'll even be able to go back in time and backdate your prediction to maximize your points.
The displayed score is split into current points and total points. Current points show how much your prediction is worth now, whereas total points show the combined worth of all of your predictions over the lifetime of the question. The scoring details are available on the FAQ.
Note: this question resolved before its original close time. All of your predictions came after the resolution, so you did not gain (or lose) any points for it.
Note: this question resolved before its original close time. You earned points up until the question resolution, but not afterwards.
This question is not yet open for predictions.
Metaculus help: Community Stats
Use the community stats to get a better sense of the community consensus (or lack thereof) for this question. Sometimes people have wildly different ideas about the likely outcomes, and sometimes people are in close agreement. There are even times when the community seems very certain of uncertainty, like when everyone agrees that event is only 50% likely to happen.
When you make a prediction, check the community stats to see where you land. If your prediction is an outlier, might there be something you're overlooking that others have seen? Or do you have special insight that others are lacking? Either way, it might be a good idea to join the discussion in the comments.