A Multimodal Fusion Scene Graph Generation Method Based on Semantic Description

2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)（2022）

Cited 0|Views8

No score

Abstract

For the scene graph generation task, a multimodal fusion scene graph generation method based on semantic description is proposed considering the problems of long-tail distribution and low frequency of high-level semantic interactions in the dataset. Firstly, target detection and relationship inference are performed on the image to construct an image scene graph. Second, the semantic descriptions are transformed into semantic graphs, which are fed into a pre-trained scene graph parser to construct semantic scene graphs. Finally, the two scene graphs are aligned for display and the information of nodes and edges are updated to obtain a fused scene graph with more comprehensive coverage and more accurate semantic interaction information.

Translated text

Key words

Scene graph,Semantic description,Multimodal fusion

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined