A Novel Deep Learning Multi-Modal Sentiment Analysis Model for English and Egyptian Arabic Dialects Using Audio and Text.

Sohaila Alalem,Mohamed Saad Zaghloul,Osama Badawy

Arab Conference on Information Technology（2023）

引用 0|浏览1

暂无评分

摘要

As emotions play an important role in human interaction, the need for sentiment analysis has become crucial for human-computer interaction. This paper proposes a new model named Audio-Text Fusion (ATFusion), for sentiment analysis that utilizes text and speech data to detect emotions. The model comprises local classifiers for audio and text inputs, followed by fusion using Group Gated Fusion (GGF) technique. Convolutional neural network (CNN), long short-term memory (LSTM) neural network and transformers are employed as building blocks for the local classifiers. The evaluation of the ATFusion model performance is demonstrated through experiments over the IEMOCAP dataset for the English language and the EYASE dataset for the Egyptian Arabic Dialect. The performance of the ATFusion Model compared to other state-of-the-art models achieved results of 76.213%, 75.146%, and 70.79%, 70.42%, unweighted accuracy and weighted accuracy over IEMOCAP and EYASE, respectively.

查看译文

关键词

Sentiment Analysis,Multi-Modal,Deep Learning,Natural Language Processing,information fusion

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要