Rich Image Captioning in the Wild
2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)(2016)
摘要
We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include generating high quality caption with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out-of-domain datasets. We also make the system publicly accessible as a part of the Microsoft Cognitive Services.
更多查看译文
关键词
image captioning,high quality caption,human judgments,out-of-domain data handling,deep vision model,visual concepts,entity recognition model,confidence model,in-domain dataset,MS COCO,microsoft cognitive services
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络