Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
CVPR 2024(2024)
摘要
Recent Vision Transformer Compression (VTC) works mainly follow a two-stage
scheme, where the importance score of each model unit is first evaluated or
preset in each submodule, followed by the sparsity score evaluation according
to the target sparsity constraint. Such a separate evaluation process induces
the gap between importance and sparsity score distributions, thus causing high
search costs for VTC. In this work, for the first time, we investigate how to
integrate the evaluations of importance and sparsity scores into a single
stage, searching the optimal subnets in an efficient manner. Specifically, we
present OFB, a cost-efficient approach that simultaneously evaluates both
importance and sparsity scores, termed Once for Both (OFB), for VTC. First, a
bi-mask scheme is developed by entangling the importance score and the
differentiable sparsity score to jointly determine the pruning potential
(prunability) of each unit. Such a bi-mask search strategy is further used
together with a proposed adaptive one-hot loss to realize the
progressive-and-efficient search for the most important subnet. Finally,
Progressive Masked Image Modeling (PMIM) is proposed to regularize the feature
space to be more representative during the search process, which may be
degraded by the dimension reduction. Extensive experiments demonstrate that OFB
can achieve superior compression performance over state-of-the-art
searching-based and pruning-based methods under various Vision Transformer
architectures, meanwhile promoting search efficiency significantly, e.g.,
costing one GPU search day for the compression of DeiT-S on ImageNet-1K.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要