• Questions
  • Tournaments
  • Services
  • News
  • Questions
  • Tournaments
  • Questions
  • Questions
Feed Home
👥
Communities
🏆
Leaderboards
💎
Metaculus Cup
🇮🇷🇮🇱
Iran-Israel Conflict
⚡
Current Events
🏛️
POTUS Predictions
💵
Fiscal Showdown
Topics
✨🔝
Top Questions
🇺🇦⚔️
Ukraine Conflict
🏦
Big Beautiful Bill
🗽
State of the Union
⏳
AI 2027
🇹🇼🇨🇳
The Taiwan Tinderbox
categories
🦠
Health & Pandemics
🌱
Environment & Climate
☢️
Nuclear Technology & Risks
🤖
Artificial Intelligence
See all categories
  • About
  • API
  • FAQ
  • forecasting resources
  • For Journalists
  • Contact
  • Careers
GuidelinesPrivacy PolicyTerms of Use
ForbesScientific AmericanTimeVoxYale NewsNature

Q1 AI Benchmarking Results: Pros Crush Bots

14
3 comments3
Q1 AI Forecasting Benchmark Tournament

Will a Gemini model be ranked #1 overall on the Chatbot Arena Leaderboard at the end of the 2nd Quarter of 2025?

Annulled

Will an OpenAI model be ranked #1 overall on the Chatbot Arena Leaderboard at the end of the 2nd Quarter of 2025?

This question is closed for forecasting. Latest Community prediction is displayed.

35%chance

Contributed by the Risk Threshold Forecasting community.

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Gemini 2.5 Pro scale model by Google?

Forecast revealed in 46 minutes

Contributed by the Risk Threshold Forecasting community.

When will OpenAI first report that an AI system has a risk level of High on Cybersecurity?

Forecast revealed in 46 minutes

Contributed by the Risk Threshold Forecasting community.

When will OpenAI first report that an AI system has a risk level of Critical on Cybersecurity?

Forecast revealed in 46 minutes

Contributed by the Risk Threshold Forecasting community.

When will OpenAI first report that an AI system has a risk level of High on AI Self-improvement?

Forecast revealed in 46 minutes

Contributed by the Risk Threshold Forecasting community.

When will OpenAI first report that an AI system has a risk level of Critical on AI Self-improvement?

Forecast revealed in 46 minutes

Contributed by the Risk Threshold Forecasting community.

When will Google first report that an AI system reached Instrumental Reasoning level 1?

Forecast revealed in 46 minutes

Contributed by the Risk Threshold Forecasting community.

When will Google first report that an AI system reached Instrumental Reasoning level 2?

Forecast revealed in 46 minutes