Multimodal Reconstruction Using Vector Representation

2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)(2018)

引用 23|浏览23
暂无评分
摘要
Recent work has demonstrated that neural embedding from multiple modalities can be utilized to focus the results of generative adversarial networks. However, little work has been done towards developing a procedure to combine vectors from different modalities for the purpose of reconstructing input. Generally, embeddings from different modalities are concatenated to create a larger input vector. In this paper, we propose learning a Common Vector Space (CVS) where similar inputs from different modalities cluster together. We develop a framework to analyze the extent of reconstruction and robustness offered by CVS. We apply the CVS for the purpose of annotating, generating and captioning images on MS-COCO. We show that CVS is on par with techniques used for multiple modality embeddings while offering more flexibility as the number of modalities increases.
更多
查看译文
关键词
annotating captioning images,multimodal reconstruction,Vector representation,neural embedding,generative adversarial networks,Common Vector Space
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要