Multimodal Product Identification - Submission to Watch and Buy 2021 Challenge.

WAB @ ACM Multimedia(2021)

引用 1|浏览4
暂无评分
摘要
This technical report describes the overview of our approach to the "Watch and Buy: Multimodal Product Identification Challenge". Specifically, we tackle this problem with a three-stage framework, i.e., product detection, retrieval and classification. For the product detection, we leverage the performance by Cascade R-CNN and deformable convolution to alleviate the impact of image distortion. For the product retrieval, we enhance the Multiple Granularity Network (MGN) with global and local context through IBN, SE and Non-local blocks. The task of product classification suffers from fashion variation. To this end, we propose to fuse the global feature of the integral images and local feature of products. Experiments demonstrate that our works could achieve competitive performance with the state-of-the-art methods and our overall approach achieves a F1 score of 0.648, ranking the second place in the final challenge.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要