Deep Emotion Recognition Based On Audio-Visual Correlation

Noushin Hajarolasvadi, Hasan Demirel

IET COMPUTER VISION(2020)

引用 3|浏览16
暂无评分
摘要
Human emotion recognition is studied by means of unimodal channels over the last decade. However, efforts continue to answer tempting questions about how variant modalities can complement each other. This study proposes a multimodal approach using three-dimensional (3D) convolutional neural networks (CNNs) to model human emotion through a modality-referenced system while investigating the solution to such questions. The proposed modality-referenced system selects the input data based on one of the modalities regarded as reference or master. The other modality which is referred to as a slave simply adjusts or attunes itself with the master in the temporal domain. In this context, the authors developed three multimodal emotion recognition system, namely, video-referenced system, audio-referenced system, and the audio-visual-referenced system to explore the congruence impact of audio and video modalities on each other. Two pipelines of 3D CNN architectures are employed where k-means clustering is used in the master pipeline and the slave pipeline adapts itself in a temporal sense. The outputs of the two pipelines are fused to improve recognition performance. In addition, canonical correlation analysis and t-distributed stochastic neighbour embedding is used validating the experiments. Results show that temporal alignment of the data between two modalities improves the recognition performance significantly.
更多
查看译文
关键词
emotion recognition,convolutional neural nets,neural net architecture,video signal processing,audio‐visual systems,image fusion,correlation methods,pattern clustering,temporal data alignment,slave pipeline,k‐means clustering,3D CNN architectures,temporal domain,unimodal channels,modality‐referenced system,three‐dimensional convolutional neural networks,multimodal approach,variant modalities,human emotion recognition,audio–visual correlation,deep emotion recognition,t‐distributed stochastic neighbour embedding,canonical correlation analysis,recognition performance,master pipeline,video modalities,audio–visual‐referenced system,audio‐referenced system,video‐referenced system,multimodal emotion recognition system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要