Action identification with fusion of BERT and 3DCNN for smart home systems.

Thai Hoang Le, Tien Minh Le, Thu Anh Nguyen

Internet Things(2023)

Cited 0|Views2
No score
Abstract
Action identification for smart home ecosystem has experienced a rising interest among relevant fields. Nevertheless, thoroughly understanding context and movement in video, which is the key component for an effective action identification system and a visual-based smart home ecosystem, remains a challenge. Recently, 3DCNN architectures including SlowFast, have proved to be the proper solution for this issue due to ability to filter spatiotemporal features in video. Additionally, the growth in Natural Language Processing, particularly BERT, has shown the potential of utilizing the Attention Mechanism for extracting temporal relationship of movement throughout a video. We introduce a novel fusing approach aiming to fully exploit the temporal relationship within spatiotemporal features produced by 3DCNN and enhance the efficiency of action identification system for smart home surveillance. The proposed method consists of two models: SlowFast with two distinct ResNet-50 backbones for spatiotemporal features extraction and BERT for aggregating temporal relationship. These models are sequentially stacked using two varieties of fusing techniques: early-ensemble and late-ensemble, constructing a robust unified system with the capability to compete against state-of-the-art modalities. This paper also contribute a new Kinetics51 dataset for our experiment on these mentioned models. The proposed method provides positively high top-1 accuracy for the following datasets: specifically, HMDB-51 dataset, which contains 51 variety of actions, achieves 78.05% using 10-crop testing; Kinetics51, our derived dataset from Kinetics400, achieves 82.53% using 10-crop testing. The performance on challenging action identification datasets HMDB51 and our Kinetics51 has shown the robustness of the mentioned fusing technique applied on two proposed models.
More
Translated text
Key words
3dcnn,bert,action
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined