Contributed by the Risk Threshold Forecasting community.

0 comments
1 forecaster

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Gemini 2.5 Pro scale model by Google?

Current estimate
18 Jul 2029