TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning.

European Conference on Computer Vision(2022)

引用 5|浏览27
暂无评分
摘要
This paper presents a transformer framework for few-shot learning, termed TransVLAD, with one focus showing the power of locally aggregated descriptors for few-shot learning. Our TransVLAD model is simple: a standard transformer encoder following a NeXtVLAD aggregation module to output the locally aggregated descriptors. In contrast to the prevailing use of CNN as part of the feature extractor, we are the first to prove self-supervised learning like masked autoencoders (MAE) can deal with the overfitting of transformers in few-shot image classification. Besides, few-shot learning can benefit from this general-purpose pre-training. Then, we propose two methods to mitigate few-shot biases, supervision bias and simple-characteristic bias. The first method is introducing masking operation into fine-tuning, by which we accelerate fine-tuning (by more than 3x) and improve accuracy. The second one is adapting focal loss into soft focal loss to focus on hard characteristics learning. Our TransVLAD finally tops 10 benchmarks on five popular few-shot datasets by an average of more than 2%.
更多
查看译文
关键词
Few-shot learning, Transformers, Self-supervised learning, NeXtVLAD, Focal loss
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要