DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
CVPR 2024(2024)
摘要
Scene graph generation aims to capture detailed spatial and semantic
relationships between objects in an image, which is challenging due to
incomplete labelling, long-tailed relationship categories, and relational
semantic overlap. Existing Transformer-based methods either employ distinct
queries for objects and predicates or utilize holistic queries for relation
triplets and hence often suffer from limited capacity in learning low-frequency
relationships. In this paper, we present a new Transformer-based method, called
DSGG, that views scene graph detection as a direct graph prediction problem
based on a unique set of graph-aware queries. In particular, each graph-aware
query encodes a compact representation of both the node and all of its
relations in the graph, acquired through the utilization of a relaxed sub-graph
matching during the training process. Moreover, to address the problem of
relational semantic overlap, we utilize a strategy for relation distillation,
aiming to efficiently learn multiple instances of semantic relationships.
Extensive experiments on the VG and the PSG datasets show that our model
achieves state-of-the-art results, showing a significant improvement of 3.5%
and 6.7% in mR@50 and mR@100 for the scene-graph generation task and achieves
an even more substantial improvement of 8.5% and 10.3% in mR@50 and mR@100
for the panoptic scene graph generation task. Code is available at
.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要