I read, I saw, I tell: Texts Assisted Fine-Grained Visual Classification.

MM '18: ACM Multimedia Conference Seoul Republic of Korea October, 2018(2018)

引用 22|浏览80
暂无评分
摘要
In visual classification tasks, it is hard to tell the subtle differences from one species to another similar breeds. Such a challenging problem is generally known as Fine-Grained Visual Classification (FGVC). In this paper, we propose a novel FGVC approach called Texts Assisted Fine-Grained Visual Classification (TA-FGVC). TA-FGVC reads from texts to gain attention, sees the images with the gained attention and then tells the subtle differences. Technically, we propose a deep neural network which learns a visual-semantic embedding model. The proposed deep architecture mainly consists of two parts: one for visual localization, and the other for visual to semantic projection. The model is fed with both visual features which are extracted from raw images and semantic information which are learned from two sources: gleaned from unannotated texts and gathered from image attributes. At the very last layer of the model, each image is embedded into the semantic space which is related to class labels. Finally, the categorization results from both visual stream and visual-semantic stream are combined to achieve the ultimate decision. Extensive experiments on open standard benchmarks verify the superiority of our model against several state of the art work.
更多
查看译文
关键词
Fine-grained visual classification, multi-modal analysis, deep learning, transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要