Pedestrian Crossing Intention Prediction Based on Cross-Modal Transformer and Uncertainty-Aware Multi-Task Learning for Autonomous Driving

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS(2024)

引用 0|浏览14
暂无评分
摘要
Accurate prediction of whether pedestrians will cross the street is prevalently recognized as an indispensable function of autonomous driving systems, especially in urban environments. How to utilize the complementary information present in different types of data (or modalities) is one of the major challenges. This paper makes the first attempt to develop a cross-modal transformer-based crossing intention prediction model merely using bounding boxes and ego-vehicle speed as input features. The cross-modal transformer can leverage self-attention and cross-modal attention to mine the modality-specific and complementary correlation. A bottleneck feature fusion is presented to obtain the compressed feature representation. To facilitate the network training, we further put forward a novel uncertainty-aware multi-task learning method that jointly predicts the future bounding box as well as crossing action such that the commonalities and differences across two tasks can be exploited. To evaluate the proposed method, extensive comparative experiments and ablation studies are performed on two benchmark datasets. The results demonstrate that by only using the bounding box and ego-vehicle speed as input features, our model is on a par with other state-of-the-art approaches that rely on more inputs, and even achieves superior performance in most cases. The source code will be released at https://github.com/xbchen82/PedCMT.
更多
查看译文
关键词
Crossing intention prediction,cross-modal transformer,multi-task learning,homoscedastic uncertainty
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要