Risk Threshold Forecasting

8 Followers

34 Questions

Moderated by romeodean

1 comment

46 forecasters

When will OpenAI first report that an AI system has achieved the following a risk levels on AI Self-improvement?

0 comments

1 forecaster

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Gemini 2.5 Pro scale model by Google?

Current estimate

18 Jul 2029

0 comments

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Gemini 2.5 Flash scale model by Google?

Current estimate

0 comments

When will 80% accuracy be achieved on Cybench by a Gemini 2.5 Pro scale model by Google?

Current estimate

0 comments

When will 80% accuracy be achieved on Cybench by a Gemini 2.5 Flash scale model by Google?

Current estimate

0 comments

When will 75% accuracy be reached on LAB-Bench Cloning Scenarios by a Gemini 2.5 Pro scale model by Google?

Current estimate

0 comments

When will 75% accuracy be reached on LAB-Bench Cloning Scenarios by a Gemini 2.5 Flash scale model by Google?

Current estimate

0 comments

1 forecaster

When will Google first report that an AI system reached or surpassed CBRN uplift level 1?

Current estimate

>Jun 2036

0 comments

9 forecasters

When will Anthropic reach or surpass ASL-4?

Current estimate

25 Mar 2029

0 comments

6 forecasters

M

Risk Threshold Forecasting

When will OpenAI first report that an AI system has achieved the following a risk levels on AI Self-improvement?

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Gemini 2.5 Pro scale model by Google?

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Gemini 2.5 Flash scale model by Google?

When will 80% accuracy be achieved on Cybench by a Gemini 2.5 Pro scale model by Google?

When will 80% accuracy be achieved on Cybench by a Gemini 2.5 Flash scale model by Google?

When will 75% accuracy be reached on LAB-Bench Cloning Scenarios by a Gemini 2.5 Pro scale model by Google?

When will 75% accuracy be reached on LAB-Bench Cloning Scenarios by a Gemini 2.5 Flash scale model by Google?

When will Google first report that an AI system reached or surpassed CBRN uplift level 1?

When will Anthropic reach or surpass ASL-4?

When will Google first report that an AI system has reached or surpassed the following Machine Learning R&D risk levels?

When will OpenAI first report that an AI system has achieved the following a risk levels on AI Self-improvement?

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Gemini 2.5 Pro scale model by Google?

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Gemini 2.5 Flash scale model by Google?

When will 80% accuracy be achieved on Cybench by a Gemini 2.5 Pro scale model by Google?

When will 80% accuracy be achieved on Cybench by a Gemini 2.5 Flash scale model by Google?

When will 75% accuracy be reached on LAB-Bench Cloning Scenarios by a Gemini 2.5 Pro scale model by Google?

When will 75% accuracy be reached on LAB-Bench Cloning Scenarios by a Gemini 2.5 Flash scale model by Google?

When will Google first report that an AI system reached or surpassed CBRN uplift level 1?

When will Anthropic reach or surpass ASL-4?

When will Google first report that an AI system has reached or surpassed the following Machine Learning R&D risk levels?