QLDT: adaptive Query Learning for HOI Detection via vision-language knowledge Transfer

Xincheng Wang,Yongbin Gao,Wenjun Yu,Chenmou Wu, Mingxuan Chen,Honglei Ma, Zhichao Chen

Applied Intelligence(2024)

引用 0|浏览1
暂无评分
摘要
Human-object interaction detection can be mainly categorized into two core problems, namely human-object association detection and interaction understanding. Firstly, for association detection, previous methods tend to directly detect obvious human-object interaction pairs, while ignoring some interaction pairs that may have potential interaction relationships, which is contrary to the actual situation. Secondly, for the interaction understanding problem, traditional methods face the challenges of long-tailed distribution and zero-shot detection, which cannot flexibly deal with complex and changing real-world scenarios. To this end, adaptive Query Learning for HOI Detection via vision-language knowledge Transfer(QLDT) is proposed. Specifically, a two-stage dynamic matching scoring algorithm based on dynamically changing thresholds and scores is designed to explore obscure H-O pairs and labeling to enlarge the sample size. Secondly, a visual-language pre-trained model GLIP (Grounded Language-Image Pre-training), is introduced to enhance the model’s interactive comprehension ability, extract the visual and linguistic features of the images through GLIP, minimize the gap with the predicted values using cross-entropy loss, and take the maximum value of the score with the obscure H-O pairs as the final prediction, which ensures the model’s positivity. The proposed method shows excellent performance on both HICO-DET and V-COCO datasets, for HICO-DET, QLDT achieved 35.37
更多
查看译文
关键词
HOI detection,Cross-modal,Interactive understanding,Query learning,Knowledge transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要