GrootVL: Tree Topology is All You Need in State Space Model
arxiv(2024)
摘要
The state space models, employing recursively propagated features,
demonstrate strong representation capabilities comparable to Transformer models
and superior efficiency. However, constrained by the inherent geometric
constraints of sequences, it still falls short in modeling long-range
dependencies. To address this issue, we propose the GrootVL network, which
first dynamically generates a tree topology based on spatial relationships and
input features. Then, feature propagation is performed based on this graph,
thereby breaking the original sequence constraints to achieve stronger
representation capabilities. Additionally, we introduce a linear complexity
dynamic programming algorithm to enhance long-range interactions without
increasing computational cost. GrootVL is a versatile multimodal framework that
can be applied to both visual and textual tasks. Extensive experiments
demonstrate that our method significantly outperforms existing structured state
space models on image classification, object detection and segmentation.
Besides, by fine-tuning large language models, our approach achieves consistent
improvements in multiple textual tasks at minor training cost.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要