Chrome Extension
WeChat Mini Program
Use on ChatGLM

Public-private Attributes-based Variational Adversarial Network for Audio-Visual Cross-Modal Matching

IEEE Transactions on Circuits and Systems for Video Technology(2024)

Cited 0|Views7
No score
Abstract
Existing audio-visual cross-modal matching methods focus on mitigating cross-modal heterogeneity but ignore the impact of intra-class discrepancy of the same identity in different scenarios, which might greatly limit the matching performance. To simultaneously handle both problems of intra-class discrepancy and cross-modal heterogeneity, we propose a novel public-private attributes-based variational adversarial network ( P 2 VANet), which captures the consistency within and between classes, for audio-visual cross-modal matching. In particular, P 2 VANet first uses a variational auto-encoder, which captures the inherent global information in diverse scenarios from the hidden variable through reconstruction, to reduce the intra-class discrepancy. Then it integrates a public attributes guidance module to capture the consistency of audio and visual by supervision of the common high-level semantic information to mitigate cross-modal heterogeneity. In addition, P 2 VANet designs private attributes embedding module to enhance the discriminative features inherent in each class to decrease inter-class similarity. Extensive experiments on audio-visual cross-modal matching demonstrate the effectiveness of the proposed approach compared with the state-of-the-art methods.
More
Translated text
Key words
Audio-visual cross-modal matching,variational adversarial learning,public-private attributes,metric learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined