Unimodal Multi-Task Fusion for Emotional Mimicry Prediction
arxiv(2024)
摘要
In this study, we propose a methodology for the Emotional Mimicry Intensity
(EMI) Estimation task within the context of the 6th Workshop and Competition on
Affective Behavior Analysis in-the-wild. Our approach leverages the Wav2Vec 2.0
framework, pre-trained on a comprehensive podcast dataset, to extract a broad
range of audio features encompassing both linguistic and paralinguistic
elements. We enhance feature representation through a fusion technique that
integrates individual features with a global mean vector, introducing global
contextual insights into our analysis. Additionally, we incorporate a
pre-trained valence-arousal-dominance (VAD) module from the Wav2Vec 2.0 model.
Our fusion employs a Long Short-Term Memory (LSTM) architecture for efficient
temporal analysis of audio data. Utilizing only the provided audio data, our
approach demonstrates significant improvements over the established baseline.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要