Event Tubelet Compressor: Generating Compact Representations for Event-Based Action Recognition

2022 7th International Conference on Control, Robotics and Cybernetics (CRC)(2022)

引用 0|浏览20
暂无评分
摘要
Event cameras asynchronously capture pixel-level intensity changes in scenes and output a stream of events. Compared with traditional frame-based cameras, they can offer competitive imaging characteristics: low latency, high dynamic range, and low power consumption. It means that event cameras are ideal for vision tasks in dynamic scenarios, such as human action recognition. The best-performing event-based algorithms convert events into frame-based representations and feed them into existing learning models. However, generating informative frames for long-duration event streams is still a challenge since event cameras work asynchronously without a fixed frame rate. In this work, we propose a novel frame-based representation named Compact Event Image (CEI) for action recognition. This representation is generated by a self-attention based module named Event Tubelet Compressor (EVTC) in a learnable way. The EVTC module adaptively summarizes the long-term dynamics and temporal patterns of events into a CEI frame set. We can combine EVTC with conventional video backbones for end-to-end event-based action recognition. We evaluate our approach on three benchmark datasets, and experimental results show it outperforms state-of-the-art methods by a large margin.
更多
查看译文
关键词
event camera,representation learning,self-attention mechanism,human action recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要