A descriptive behavior intention inference framework using spatio-temporal semantic features for human-robot real-time interaction

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE(2024)

引用 0|浏览0
暂无评分
摘要
Visual behavior intention inference is crucial for enabling escort robots to interact naturally with humans, which is very challenging due to the big inner-class similarity and the small intra-class distinguishability of successive actions in the assistive scenario. To attain a reliable behavior intention inference, not only the current state of behaviors is concerned, but also the semantic information in both spatial and temporal domains plays an important role. This paper presents a segmentation-detection-recognition hierarchical system to represent the spatio-temporal semantic features for formulating descriptions of body parts, trajectories and deep relationships of sub-behaviors. Specifically, a dense trajectory matching scheme based on temporal sampling and Binarized Normed Gradients (BING) algorithm is formulated to segment the 3-Dimensional (3D) behavior cubes, based on which, local trajectories are obtained by clustering dense trajectories according to the distance similarity, and the body parts are then detected by multi-kernel learning of the encoded local features. Moreover, a global three-stream context Convolutional Neural Networks (CNN) is proposed for behavior classification by designing a texture module using expansion, connection and 1D convolution implementations. Based on transfer learning, scene information is also recognized efficiently. Finally, the semantic descriptors are modeled by two cascaded And-Or Graphs (AoGs) constraining the spatial scenarios and temporal sequences. Our unified approach is demonstrated on two public benchmarks containing long-term activities and on an escort robot for real-world applications.
更多
查看译文
关键词
Behavior intention inference,Spatio-temporal semantic representation,Human-robot visual interaction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要