谷歌Chrome浏览器插件
订阅小程序
在清言上使用

A Multi-modal System for Video Semantic Understanding

CCKS 2021 - Evaluation Track(2022)

引用 0|浏览3
暂无评分
摘要
This paper proposes a video semantic understanding system based on multi-modal data fusion. The system includes two sub-models, the video classification tag model (VCT) and the video semantic tag model (VST), to generate classification tags and semantic tags for videos respectively. The VCT model uses bidirectional LSTM model and Attention mechanism to integrate the video features, which can effectively improve the model result than other methods. The VST model directly extracts semantic tags from text data with the combined model of ROBERTA and CRF. We implemented the system in the CCKS 2021 Task 14 and achieved an F1 score of 0.5054, ranking second among 187 teams.
更多
查看译文
关键词
Multi-modal representation, Semantic understanding, Video
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要