• Questions
  • Tournaments
  • Services
  • News
  • Questions
  • Tournaments
  • Questions
  • Questions
Feed Home
👥
Communities
🏆
Leaderboards
💎
Metaculus Cup
🇮🇷🇮🇱
Iran-Israel Conflict
⚡
Current Events
🏛️
POTUS Predictions
💵
Fiscal Showdown
Topics
✨🔝
Top Questions
🇺🇦⚔️
Ukraine Conflict
🏦
Big Beautiful Bill
🗽
State of the Union
⏳
AI 2027
🇹🇼🇨🇳
The Taiwan Tinderbox
categories
🦠
Health & Pandemics
🌱
Environment & Climate
☢️
Nuclear Technology & Risks
🤖
Artificial Intelligence
See all categories
  • About
  • API
  • FAQ
  • forecasting resources
  • For Journalists
  • Contact
  • Careers
GuidelinesPrivacy PolicyTerms of Use
ForbesScientific AmericanTimeVoxYale NewsNature

Who will win the bet between Gary Marcus and Nathan Young on the progress of Waymo vs Tesla robotaxis?

Gary Marcus win77.6%
Nathan Young win20.4%
Other2%

Q1 AI Benchmark Results: Pro Forecasters Crush Bots

14
3 comments3
Q1 AI Forecasting Benchmark Tournament

Will a Gemini model be ranked #1 overall on the Chatbot Arena Leaderboard at the end of the 2nd Quarter of 2025?

Annulled

Will an OpenAI model be ranked #1 overall on the Chatbot Arena Leaderboard at the end of the 2nd Quarter of 2025?

This question is closed for forecasting. Latest Community prediction is displayed.

35%chance

Contributed by the Risk Threshold Forecasting community.

When will 80% accuracy be achieved on Cybench by a GPT-4.5 scale model by OpenAI?

Contributed by the Risk Threshold Forecasting community.

When will 80% accuracy be achieved on Cybench by a GPT-4.1 scale model by OpenAI?

Contributed by the Risk Threshold Forecasting community.

When will 80% accuracy be achieved on Cybench by a Claude Opus 4 scale model by Anthropic?

Contributed by the Risk Threshold Forecasting community.

When will 80% accuracy be achieved on Cybench by a Claude Sonnet 4 scale model by Anthropic?

Contributed by the Risk Threshold Forecasting community.

When will 80% accuracy be achieved on Cybench by a Gemini 2.5 Pro scale model by Google?

Contributed by the Risk Threshold Forecasting community.

When will 80% accuracy be achieved on Cybench by a Gemini 2.5 Flash scale model by Google?