Harnessing Overlap in Blockwise Transformers for Near-Infinite Context

Hao Liu,Matei Zaharia, Pieter Abbeel

ICLR 2024(2024)

引用 0|浏览4
暂无评分
摘要
Transformers have emerged as the architecture of choice for for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands imposed by Transformers limit their ability to handle long sequences, thereby creating challenges for tasks involving extended sequences or long-term dependencies. We present a distinct approach, Ring Attention, which leverages blockwise computation of self-attention to distribute long sequences across multiple devices while concurrently overlapping the communication of key-value blocks between devices through blockwise attention computation. By processing longer input sequences while maintaining memory efficiency, Ring Attention enables training and inference of sequences that exceed 100 million tokens in length, allowing length to scale proportionally with the number of devices, effectively eliminating the memory constraints imposed by individual devices. Extensive experiments on language modeling tasks demonstrate the effectiveness of Ring Attention in reducing memory requirements and improving performance.
更多
查看译文
关键词
Language Model,Long Context Modeling,Reinforcement Learning,Unsupervised Reinforcement Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要