AlertMe: Towards Natural Language-Based Live Video Trigger Systems at the Edge

MOBISYS(2021)

引用 0|浏览17
暂无评分
摘要
ABSTRACTAdvances in deep learning have enabled brand new video analytics systems and applications. Existing systems research on real-time video event detection does not consider matching based on natural language; rather, it focuses on using Domain Specific Languages that define spatio-temporal operators on video streams for efficient matching. Alternatively, research in the multimodal AI community on joint understanding of video and language focuses on applications such as language-based video retrieval, where videos may have been processed offline. In this work, we propose AlertMe, a multimodal-based live video trigger system that matches incoming video streams to a set of user-defined natural language triggers. We dynamically select the optimal sliding window size to extract feature vectors from different modalities in near real time. We also describe our approach to achieve on-device deployment by introducing a profiler to select runtime-efficient feature extractors. Lastly, we show that limiting the number of trigger candidates can significantly increase event detection performance in applications such as task following in AR glasses.
更多
查看译文
关键词
Edge Computing, Multimodal Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要