Chrome Extension
WeChat Mini Program
Use on ChatGLM

MHRN: A Multimodal Hierarchical Reasoning Network for Topic Detection

IEEE TRANSACTIONS ON MULTIMEDIA(2024)

Cited 0|Views23
No score
Abstract
Multimodal topic detection is an important social media analysis task with a wide variety of real-world applications. However, modeling data jointly, and inferring their topics, is challenging due to the semantic gaps between different modalities. Our insights are from the psychological findings pretaining to the hierarchical structure in humans' inherent perception of images and texts. In this paper, we propose a Multimodal Hierarchical Reasoning Network (MHRN) to perform multimodal inference for topic detection. The images and texts are represented in a hierarchical model named the Multimodal Part-whole Aware Graph (MPAG). MHRN then performs reasoning for topic inference based on three modules, which include a Bottom-Up Aggregation (BUA) module for encoding the hierarchical connections and sibling relations in MPAG, a Top-Down Guidance (TDG) module for enriching features of the nodes in MPAG guided by their parents, and a Bottom-Up Cross Aggregation (BUCA) module for capturing and aggregating the cross-modality cues to achieve effective multimodal reasoning. Extensive experiments are conducted on two benchmarks, and the results demonstrate the superiority of our approach.
More
Translated text
Key words
Multimodal topic detection,image and text fusion,hierarchical learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined