Representation Learning Through Cross-Modality Supervision

2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)(2019)

引用 19|浏览17
暂无评分
摘要
Learning robust representations for applications with multiple modalities of input can have a significant impact on its performance. Traditional representation learning methods rely on projecting the input modalities to a common subspace to maximize agreement amongst the modalities for a particular task. We propose a novel approach to representation learning that uses a latent representation decoder to reconstruct the target modality and thereby employs the target modality purely as a supervision signal for discovering correlations between the modalities. Through cross modality supervision, we demonstrate that the learnt representation is able to improve the performance of the task of facial action unit (AU) recognition when compared with the modality specific representations and even their fused counterparts. Our experiments on three AU recognition datasets - MMSE, BP4D and DISFA, show strong performance gains producing state-of-the-art results in spite of the absence of a modality.
更多
查看译文
关键词
cross-modality supervision,thermal image reconstruction,facial action unit recognition,target modality reconstruction,representation decoder,representation learning,supervision signal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要