Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
CoRR(2024)
摘要
Auto-regressive large language models (LLMs) show impressive capacities to
solve many complex reasoning tasks while struggling with some simple logical
reasoning tasks such as inverse search: when trained on ”A is B”, LLM fails
to directly conclude ”B is A” during inference, which is known as the
”reversal curse” (Berglund et al., 2023). In this paper, we theoretically
analyze the reversal curse via the training dynamics of (stochastic) gradient
descent for two auto-regressive models: (1) a bilinear model that can be viewed
as a simplification of a one-layer transformer; (2) one-layer transformers
using the framework of Tian et al. (2023a). Our analysis reveals a core reason
why the reversal curse happens: the (effective) weights of both auto-regressive
models show asymmetry, i.e., the increase of weights from a token A to token
B during training does not necessarily cause the increase of the weights from
B to A. Moreover, our analysis can be naturally applied to other logical
reasoning tasks such as chain-of-thought (COT) (Wei et al., 2022b). We show the
necessity of COT, i.e., a model trained on ”A → B” and ”B → C”
fails to directly conclude ”A → C” without COT (also empirically observed
by Allen-Zhu and Li (2023)), for one-layer transformers via training dynamics,
which provides a new perspective different from previous work (Feng et al.,
2024) that focuses on expressivity. Finally, we also conduct experiments to
validate our theory on multi-layer transformers under different settings.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要