Scaling Transformer to 1M tokens and beyond with RMT
arXiv (Cornell University)(2023)
摘要
A major limitation for the broader scope of problems solvable by transformers
is the quadratic scaling of computational complexity with input size. In this
study, we investigate the recurrent memory augmentation of pre-trained
transformer models to extend input context length while linearly scaling
compute. Our approach demonstrates the capability to store information in
memory for sequences of up to an unprecedented two million tokens while
maintaining high retrieval accuracy. Experiments with language modeling tasks
show perplexity improvement as the number of processed input segments
increases. These results underscore the effectiveness of our method, which has
significant potential to enhance long-term dependency handling in natural
language understanding and generation tasks, as well as enable large-scale
context processing for memory-intensive applications.
更多查看译文
关键词
rmt,transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要