Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing
CoRR(2024)
摘要
Safely navigating street intersections is a complex challenge for blind and
low-vision individuals, as it requires a nuanced understanding of the
surrounding context - a task heavily reliant on visual cues. Traditional
methods for assisting in this decision-making process often fall short, lacking
the ability to provide a comprehensive scene analysis and safety level. This
paper introduces an innovative approach that leverages large multimodal models
(LMMs) to interpret complex street crossing scenes, offering a potential
advancement over conventional traffic signal recognition techniques. By
generating a safety score and scene description in natural language, our method
supports safe decision-making for the blind and low-vision individuals. We
collected crosswalk intersection data that contains multiview egocentric images
captured by a quadruped robot and annotated the images with corresponding
safety scores based on our predefined safety score categorization. Grounded on
the visual knowledge, extracted from images, and text prompt, we evaluate a
large multimodal model for safety score prediction and scene description. Our
findings highlight the reasoning and safety score prediction capabilities of a
LMM, activated by various prompts, as a pathway to developing a trustworthy
system, crucial for applications requiring reliable decision-making support.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要