Small-scale Proxies for Large-Scale Transformer Training InstabilitiesMitchell Wortsman,Peter J Liu,Lechao Xiao,Katie Everett,Alexander A Alemi,Ben Adlam,John D Co-Reyes,Izzeddin Gur,Abhishek Kumar,Roman Novak,Jeffrey Pennington,Jascha Sohl-Dickstein,Kelvin Xu,Jaehoon Lee,Justin Gilmer,Simon KornblithICLR 2024(2024)引用 79|浏览376关键词Small Transformers,Training,StabilityAI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要