Monotonic Representation of Numeric Properties in Language Models
arxiv(2024)
摘要
Language models (LMs) can express factual knowledge involving numeric
properties such as Karl Popper was born in 1902. However, how this information
is encoded in the model's internal representations is not understood well.
Here, we introduce a simple method for finding and editing representations of
numeric properties such as an entity's birth year. Empirically, we find
low-dimensional subspaces that encode numeric properties monotonically, in an
interpretable and editable fashion. When editing representations along
directions in these subspaces, LM output changes accordingly. For example, by
patching activations along a "birthyear" direction we can make the LM express
an increasingly late birthyear: Karl Popper was born in 1929, Karl Popper was
born in 1957, Karl Popper was born in 1968. Property-encoding directions exist
across several numeric properties in all models under consideration, suggesting
the possibility that monotonic representation of numeric properties
consistently emerges during LM pretraining. Code:
https://github.com/bheinzerling/numeric-property-repr
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要