Emergent Inabilities? Inverse Scaling Over the Course of Pretraining.

CoRR(2023)

引用 0|浏览6
暂无评分
摘要
Does inverse scaling only occur as a function of model parameter size, or can it also occur over the course of training? We carry out an exploratory study investigating whether, over the course of training on the language modeling task, the performance of language models at specific tasks can decrease while general performance remains high. We find that for two tasks from the Inverse Scaling Challenge - quote-repetition and redefine-math - this is indeed the case. Specifically, we find that for Pythia (Biderman et al., 2023) models with a higher number of parameters, performance decreases over the course of training at these two tasks, despite these models showing standard (positive) scaling overall. This highlights the importance of testing model performance at all relevant benchmarks any time they are trained on additional data, even if their overall performance improves.
更多
查看译文
关键词
emergent inabilities,inverse scaling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要