SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
CoRR(2024)
Abstract
Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to
access video content. However, conventional static ADs can leave out detailed
information in videos, impose a high mental load, neglect the diverse needs and
preferences of BLV users, and lack immersion. To tackle these challenges, we
introduce SPICA, an AI-powered system that enables BLV users to interactively
explore video content. Informed by prior empirical studies on BLV video
consumption, SPICA offers novel interactive mechanisms for supporting temporal
navigation of frame captions and spatial exploration of objects within key
frames. Leveraging an audio-visual machine learning pipeline, SPICA augments
existing ADs by adding interactivity, spatial sound effects, and individual
object descriptions without requiring additional human annotation. Through a
user study with 14 BLV participants, we evaluated the usability and usefulness
of SPICA and explored user behaviors, preferences, and mental models when
interacting with augmented ADs.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined