RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers
arxiv(2024)
摘要
Transformer structure has achieved great success in multiple applied machine
learning communities, such as natural language processing (NLP), computer
vision (CV) and information retrieval (IR). Transformer architecture's core
mechanism – attention requires O(n^2) time complexity in training and O(n)
time complexity in inference. Many works have been proposed to improve the
attention mechanism's scalability, such as Flash Attention and Multi-query
Attention. A different line of work aims to design new mechanisms to replace
attention. Recently, a notable model structure – Mamba, which is based on
state space models, has achieved transformer-equivalent performance in multiple
sequence modeling tasks.
In this work, we examine 's efficacy through the lens of a classical IR
task – document ranking. A reranker model takes a query and a document as
input, and predicts a scalar relevance score. This task demands the language
model's ability to comprehend lengthy contextual inputs and to capture the
interaction between query and document tokens. We find that (1) Mamba models
achieve competitive performance compared to transformer-based models with the
same training recipe; (2) but also have a lower training throughput in
comparison to efficient transformer implementations such as flash attention. We
hope this study can serve as a starting point to explore Mamba models in other
classical IR tasks. Our code implementation and trained checkpoints are made
public to facilitate reproducibility
(https://github.com/zhichaoxu-shufe/RankMamba).
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要