Generalized orderless pooling performs implicit salient matching

2017 IEEE International Conference on Computer Vision (ICCV)(2017)

引用 50|浏览81
暂无评分
摘要
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to "alpha-pooling", allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions made by these approaches. We identify parts of training images having the highest influence on the prediction of a given test image. It allows for justifying decisions to users and also for analyzing the influence of semantic parts. For example, we can show that the higher capacity VGG16 model focuses much more on the bird's head than, e.g., the lower-capacity VGG-M model when recognizing fine-grained bird categories. Both contributions allow us to analyze the difference when moving between average and bilinear pooling. In addition, experiments show that our generalized approach can outperform both across a variety of standard datasets.
更多
查看译文
关键词
α-pooling,pooling strategy,training images,semantic parts,generalized approach,average pooling,final feature encoding step,fine-grained recognition,global representations,test image,CNN architectures,bilinear pooling,VGG16 model,VGG-M model,fine-grained bird categories recognition,generalized orderless pooling,implicit salient matching,decision visualization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要