Object-Level Feature Memory and Aggregation for Live-Stream Video Object Detection

Yi Li,Sile Ma, Zhenyu Li,Yizhong Luan, Zecui Jiang

2023 China Automation Congress (CAC)(2023)

Cited 0|Views0
No score
Abstract
This paper proposes an object-level feature memory module that utilizes attention mechanisms to explore spatial and temporal contexts in videos. Compared to still-image object detectors, video object detectors consider features in the spatiotemporal dimensions, leading to higher accuracy. How-ever, previous video object detection methods often focused on memory and fusion at the feature map level when integrating features across different frames. These approaches not only introduce significant computational and memory burdens but also introduces considerable noise. To address these challenges, we introduce object-level feature memory, which not only retains features from previous frames but also reduces memory and computational overhead, resulting in a substantial improvement in the performance of video object detectors. The experiments conducted on the UA-DETRAC dataset validate the effectiveness of our approach in live-stream video object detection scenarios. Our method achieved 66.73% AP based on YOLOX-S, which is 4.0% more AP than the normal YOLOX-S. Our codes are released at https://github.com/Liyi4578/0FMA.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined