Gaze Target Estimation inspired by Interactive Attention

IEEE Transactions on Circuits and Systems for Video Technology(2022)

引用 4|浏览13
暂无评分
摘要
As an essential nonverbal cue, the human gaze reveals human intentions and plays a crucial role in human daily activities. Therefore, automatic detection of the person’s gaze target has drawn the interests of the computer vision community. This is useful not only for identifying whether children are attentive in class but also for locating items of interest to humans in retail settings. Existing gaze-following methods have only explored and exploited the scenes context and the head cues. Considering the significance of human-object interaction in understanding human intentions, we present the Visual-Spatial Graph and introduce a graph attention network to analyze the interaction probability between the human and elements in the scene. Then the interaction probability inferred from the visual-spatial information that is aggregated by the attention mechanism can be transformed into an interactive attention map that depicts the areas people care about. In addition, we construct a transformer as an encoder to integrate the features extracted by the scene and head pathways aiming to decode the gaze target. After introducing interactive attention, our proposed method achieves outstanding performance on two benchmarks: GazeFollow and VideoAttentionTarget. Our code is available at https://github.com/nkuhzx/VSG-IA .
更多
查看译文
关键词
Interactive attention map,interaction probability,visual-spatial graph,transformer encoder,gaze-following
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要