UniMem: Towards a Unified View of Long-Context Large Language Models
CoRR(2024)
摘要
Long-context processing is a critical ability that constrains the
applicability of large language models. Although there exist various methods
devoted to enhancing the long-context processing ability of large language
models (LLMs), they are developed in an isolated manner and lack systematic
analysis and integration of their strengths, hindering further developments. In
this paper, we introduce UniMem, a unified framework that reformulates existing
long-context methods from the view of memory augmentation of LLMs. UniMem is
characterized by four key dimensions: Memory Management, Memory Writing, Memory
Reading, and Memory Injection, providing a systematic theory for understanding
various long-context methods. We reformulate 16 existing methods based on
UniMem and analyze four representative methods: Transformer-XL, Memorizing
Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their
design principles and strengths. Based on these analyses, we propose UniMix, an
innovative approach that integrates the strengths of these algorithms.
Experimental results show that UniMix achieves superior performance in handling
long contexts with significantly lower perplexity than baselines.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要