CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
CoRR(2024)
摘要
Positional encoding plays a crucial role in transformers, significantly
impacting model performance and length generalization. Prior research has
introduced absolute positional encoding (APE) and relative positional encoding
(RPE) to distinguish token positions in given sequences. However, both APE and
RPE remain fixed after model training regardless of input data, limiting their
adaptability and flexibility. Hence, we expect that the desired positional
encoding should be context-adaptive and can be dynamically adjusted with the
given attention. In this paper, we propose a Context-Adaptive Positional
Encoding (CAPE) method, which dynamically and semantically adjusts based on
input context and learned fixed priors. Experimental validation on real-world
datasets (Arxiv, Books3, and CHE) demonstrates that CAPE enhances model
performances in terms of trained length and length generalization, where the
improvements are statistically significant. The model visualization suggests
that our model can keep both local and anti-local information. Finally, we
successfully train the model on sequence length 128 and achieve better
performance at evaluation sequence length 8192, compared with other static
positional encoding methods, revealing the benefit of the adaptive positional
encoding method.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要