A context-sensitive multi-tier deep learning framework for multimodal sentiment analysis

P. Ganesh Kumar,S. Arul Antran Vijay,V. Jothi Prakash,Anand Paul,Anand Nayyar

MULTIMEDIA TOOLS AND APPLICATIONS（2023）

引用 0|浏览1

暂无评分

摘要

One of the most appealing multidisciplinary research areas in Artificial Intelligence (AI) is Sentiment Analysis (SA). Due to the intricate and complementary interactions between several modalities, Multimodal Sentiment Analysis (MSA) is an extremely difficult work that has a wide range of applications. In the subject of multimodal sentiment analysis, numerous deep learning models and different techniques have been suggested, but they do not investigate the explicit context of words and are unable to model diverse components of a sentence. Hence, the full potential of such diverse data has not been explored. In this research, a Context-Sensitive Multi-Tier Deep Learning Framework (CS-MDF) is proposed for sentiment analysis on multimodal data. The CS-MDF uses a three-tier architecture for extracting context-sensitive information. The first tier utilizes Convolutional Neural Network (CNN) for extracting text-based features, 3D-CNN model for extracting visual features and open-Source Media Interpretation by Large feature-space Extraction (openSMILE) tool kit for audio feature extraction.The first tier focuses on extracting the unimodal features from the utterances. This level of extraction ignores context-sensitive data while determining the feature.CNNs are suitable for text data because they are particularly useful for identifying local patterns and dependencies in data.The second tier uses the features extracted from the first tier.The context-sensitive unimodal characteristics are extracted in this tier using the Bi-directional Gated Recurrent Unit (BiGRU), which is used to comprehend inter-utterance links and uncover contextual evidence.The output from tier two is combined and passed to the third tier, which fuses the features from different modalities and trains a single BiGRU model that provides the final classification.This method applies the BiGRU model to sequential data processing, using the advantages of both modalities and capturing their interdependencies.Experimental results obtained on six real-life datasets (Flickr Images dataset, Multi-View Sentiment Analysis dataset, Getty Images dataset, Balanced Twitter for Sentiment Analysis dataset, CMU-MOSI Dataset) show that the proposed CS-MDF model has achieved better performance compared with ten state-of-the-art approaches, which are validated by F1 score, precision, accuracy, and recall metrics.An ablation study is carried out on the proposed framework that demonstrates the viability of the design. The GradCAM visualization technique is applied to visualize the aligned input image-text pairs learned by the proposed CS-MDF model.

查看译文

关键词

Deep learning,Gated recurrent unit,Information retrieval,Multimedia analysis,Multimodal sentiment analysis,Sentiment analysis

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要