Chrome Extension
WeChat Mini Program
Use on ChatGLM

Align R-CNN: A Pairwise Head Network for Visual Relationship Detection

IEEE transactions on multimedia(2022)

Cited 12|Views159
No score
Abstract
Scene graphs connect individual objects with visual relationships. They serve as a comprehensive scene representation for downstream multimodal tasks. However, by exploring recent progress in Scene Graph Generation (SGG), we find that the performance of recent works is highly limited by the pairwise relationship modeling by naive feature concatenation. Such pairwise features lack sufficient object interaction due to the mis-aligned object parts, resulting in non-discriminative pairwise features for visual relationship prediction. For example, naive concatenated pairwise feature usually make the model fail to discriminate between riding and feeding for object pair person and horse . To this end, we design a meta-architecture— learning-to-align — for dynamic object feature concatenation. We call our model: Align R-CNN . Specifically, we introduce a novel attention-based multiple region alignment module that can be jointly optimized with SGG. Experiments on the large-scale SGG benchmark Visual Genome show that the proposed Align R-CNN can replace the naive feature concatenation and thus boost all the existing SGG methods.
More
Translated text
Key words
Feature extraction,Visualization,Task analysis,Predictive models,Semantics,Context modeling,Data mining,Pairwise feature alignment,scene graph generation,visual attention
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined