Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data
CoRR(2024)
摘要
The ability to generate sentiment-controlled feedback in response to
multimodal inputs, comprising both text and images, addresses a critical gap in
human-computer interaction by enabling systems to provide empathetic, accurate,
and engaging responses. This capability has profound applications in
healthcare, marketing, and education. To this end, we construct a large-scale
Controllable Multimodal Feedback Synthesis (CMFeed) dataset and propose a
controllable feedback synthesis system. The proposed system includes an
encoder, decoder, and controllability block for textual and visual inputs. It
extracts textual and visual features using a transformer and Faster R-CNN
networks and combines them to generate feedback. The CMFeed dataset encompasses
images, text, reactions to the post, human comments with relevance scores, and
reactions to the comments. The reactions to the post and comments are utilized
to train the proposed model to produce feedback with a particular (positive or
negative) sentiment. A sentiment classification accuracy of 77.23
achieved, 18.82
Moreover, the system incorporates a similarity module for assessing feedback
relevance through rank-based metrics. It implements an interpretability
technique to analyze the contribution of textual and visual features during the
generation of uncontrolled and controlled feedback.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要