EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
CoRR(2022)
摘要
High-resolution dense prediction enables many appealing real-world
applications, such as computational photography, autonomous driving, etc.
However, the vast computational cost makes deploying state-of-the-art
high-resolution dense prediction models on hardware devices difficult. This
work presents EfficientViT, a new family of high-resolution vision models with
novel multi-scale linear attention. Unlike prior high-resolution dense
prediction models that rely on heavy softmax attention, hardware-inefficient
large-kernel convolution, or complicated topology structure to obtain good
performances, our multi-scale linear attention achieves the global receptive
field and multi-scale learning (two desirable features for high-resolution
dense prediction) with only lightweight and hardware-efficient operations. As
such, EfficientViT delivers remarkable performance gains over previous
state-of-the-art models with significant speedup on diverse hardware platforms,
including mobile CPU, edge GPU, and cloud GPU. Without performance loss on
Cityscapes, our EfficientViT provides up to 13.9× and 6.2× GPU
latency reduction over SegFormer and SegNeXt, respectively. For
super-resolution, EfficientViT delivers up to 6.4x speedup over Restormer while
providing 0.11dB gain in PSNR. For Segment Anything, EfficientViT delivers
48.9x higher throughput on A100 GPU while achieving slightly better zero-shot
instance segmentation performance on COCO.
更多查看译文
关键词
attention,efficientvit,prediction,multi-scale,high-resolution
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要