When can transformers reason with abstract symbols?

Enric Boix-Adsera,Omid Saremi,Emmanuel Abbe,Samy Bengio,Etai Littwin,Joshua Susskind

ICLR 2024（2023）

引用 0|浏览39

暂无评分

摘要

We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.

查看译文

关键词

transformers,language models,reasoning,theoretical analysis,variable binding

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要