Video Captioning in Bengali With Visual Attention

2022 25th International Conference on Computer and Information Technology (ICCIT)(2022)

引用 0|浏览2
暂无评分
摘要
Generating automatic video captions is one of the most challenging Artificial Intelligence tasks as it combines Computer Vision and Natural Language Processing research areas. The task is more difficult for a complex language like Bengali as there is a general lack of video captioning datasets in the Bengali language. To overcome this challenge, we introduce a fully human-annotated dataset of Bengali captions in this research for the videos of the MSVD dataset. We have proposed a novel end-to-end architecture with an attention-based decoder to generate meaningful video captions in the Bengali language. First, spatial and temporal features of videos are combined using Bidirectional Gated Recurrent Units (Bi-GRU) that generate the input feature, which is later fed to the attention layer along with embedded caption features. This attention mechanism explores the interdependence between visual and textual representations. Then, a double-layered GRU takes these combined attention features for generating meaningful sentences. We trained this model on our proposed dataset and achieved 39.35% in BLEU-4, 59.67% in CIDEr, and 65.34% score in ROUGE. This is the state-of-the-art result compared to any other video captioning work available in the Bengali language.
更多
查看译文
关键词
Video Captioning,GRU,Attention Mechanism,CNN,Embedding,Encoder-Decoder
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要