Multimodal Reconstruction Using Vector Representation

Shagan Sah,Ameya Shringi,Dheeraj Peri,John Hamilton,Andreas E. Savakis,Ray Ptucha

2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)（2018）

引用 23|浏览23

暂无评分

摘要

Recent work has demonstrated that neural embedding from multiple modalities can be utilized to focus the results of generative adversarial networks. However, little work has been done towards developing a procedure to combine vectors from different modalities for the purpose of reconstructing input. Generally, embeddings from different modalities are concatenated to create a larger input vector. In this paper, we propose learning a Common Vector Space (CVS) where similar inputs from different modalities cluster together. We develop a framework to analyze the extent of reconstruction and robustness offered by CVS. We apply the CVS for the purpose of annotating, generating and captioning images on MS-COCO. We show that CVS is on par with techniques used for multiple modality embeddings while offering more flexibility as the number of modalities increases.

查看译文

关键词

annotating captioning images,multimodal reconstruction,Vector representation,neural embedding,generative adversarial networks,Common Vector Space

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要