22 comments
29 forecasters

If and when this graph is extended to 10^14 parameter models trained on 10^14 elapsed tokens of similar-quality data, will the 10^14 parameter learning curve have slowed down substantially?

30%chance