Contributed by the Risk Threshold Forecasting community.

When will an 8 hour, 80% reliability time horizon be achieved on METR’s Autonomy Tasks by a Claude Sonnet 4 scale model by Anthropic?

Current estimate

Key Factors

No key factors yetAdd some that might influence this forecast.
Add key factor