Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
CoRR(2024)
摘要
Recently, large language models (LLMs) have shown remarkable capabilities
including understanding context, engaging in logical reasoning, and generating
responses. However, this is achieved at the expense of stringent computational
and memory requirements, hindering their ability to effectively support long
input sequences. This survey provides an inclusive review of the recent
techniques and methods devised to extend the sequence length in LLMs, thereby
enhancing their capacity for long-context understanding. In particular, we
review and categorize a wide range of techniques including architectural
modifications, such as modified positional encoding and altered attention
mechanisms, which are designed to enhance the processing of longer sequences
while avoiding a proportional increase in computational requirements. The
diverse methodologies investigated in this study can be leveraged across
different phases of LLMs, i.e., training, fine-tuning and inference. This
enables LLMs to efficiently process extended sequences. The limitations of the
current methodologies is discussed in the last section along with the
suggestions for future research directions, underscoring the importance of
sequence length in the continued advancement of LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要