Chrome Extension
WeChat Mini Program
Use on ChatGLM

EdgeViT: Efficient Visual Modeling for Edge Computing

Wireless Algorithms, Systems, and Applications(2022)

Cited 2|Views10
No score
Abstract
With the rapid growth of edge intelligence, a higher level of deep neural network computing efficiency is required. Visual intelligence, as the core component of artificial intelligence, is particularly worth more exploration. As the cornerstone of modern visual modeling, convolutional neural networks (CNNs) have greatly developed in the past decades. Variants of light-weight CNNs have also been proposed to address the challenge of heavy computing in mobile settings. Though CNNs’ spatial inductive biases allow them to learn representations with fewer parameters across different vision tasks, these models are spatially local. To acquire a next-level model performance, vision transformer (ViT) is now a viable alternative due to the potential of multi-head attention mechanism. In this work, we introduce EdgeViT, an accelerated deep visual modeling method that incorporates the benefits of CNNs and ViTs in a light-weight and edge-friendly manner. Our proposed method can achieve top-1 accuracy of 77.8% using only 2.3 million parameters, 79.2% using 5.6 million parameters on ImageNet-1k dataset. It can achieve mIoU up to 78.3 on PASCAL VOC segmentation while only using 3.1 million parameters which is only half of MobileViT parameter budget.
More
Translated text
Key words
Edge computing,Vision transformer,Lite computation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined