MambaByte: Token-free Selective State Space Model
arxiv(2024)
摘要
Token-free language models learn directly from raw bytes and remove the
inductive bias of subword tokenization. Operating on bytes, however, results in
significantly longer sequences. In this setting, standard autoregressive
Transformers scale poorly as the effective memory required grows with sequence
length. The recent development of the Mamba state space model (SSM) offers an
appealing alternative approach with a fixed-sized memory state and efficient
decoding. We propose MambaByte, a token-free adaptation of the Mamba SSM
trained autoregressively on byte sequences. In terms of modeling, we show
MambaByte to be competitive with, and even to outperform, state-of-the-art
subword Transformers on language modeling tasks while maintaining the benefits
of token-free language models, such as robustness to noise. In terms of
efficiency, we develop an adaptation of speculative decoding with tokenized
drafting and byte-level verification. This results in a 2.6× inference
speedup to the standard MambaByte implementation, showing similar decoding
efficiency as the subword Mamba. These findings establish the viability of SSMs
in enabling token-free language modeling.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要