Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement
arxiv(2024)
摘要
Recently, many methods have been developed to extend the context length of
pre-trained large language models (LLMs), but they often require fine-tuning at
the target length (≫4K) and struggle to effectively utilize information
from the middle part of the context. To address these issues, we propose
Continuity-Relativity indExing with
gAussian Middle (CREAM), which interpolates positional
encodings by manipulating position indices. Apart from being simple, CREAM is
training-efficient: it only requires fine-tuning at the pre-trained context
window (eg, Llama 2-4K) and can extend LLMs to a much longer target context
length (eg, 256K). To ensure that the model focuses more on the information in
the middle, we introduce a truncated Gaussian to encourage sampling from the
middle part of the context during fine-tuning, thus alleviating the
“Lost-in-the-Middle” problem faced by long-context LLMs. Experimental results
show that CREAM successfully extends LLMs to the target length for both Base
and Chat versions of with “Never Miss A Beat”. Our code
will be publicly available soon.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要