TLRNet: Tibetan Lip Reading Based on ResNet and BiGRU

Zhenye Gan, Xu Ding,Xinke Yu, Zhenxing Kong

2023 2nd Asia Conference on Electrical, Power and Computer Engineering (EPCE)(2023)

引用 0|浏览6
暂无评分
摘要
Lip reading, also known as visual speech recognition, is a way of human-computer interaction based on visual information. At present, the research on lip reading mainly focuses on English and Mandarin Chinese, and there are relatively few studies on Tibetan, a low-resource minority language. Therefore, the present study proposes a specific deep learning model named the TLRNet for word-level visual speech recognition for Tibetan. The model comprises the ResNet-18 architecture, which is a residual neural network, and the BiGRU layer, a bi-directional gated recurrent unit. We train and evaluate it on the TLRW-50 dataset, which consists of fifty common Tibetan words. Our proposed model achieves Top-1 and Top-5 classification accuracies of 41.82% and 59.37%, respectively, demonstrating its potential effectiveness in recognizing Tibetan spoken words based on visual cues.
更多
查看译文
关键词
Lip reading,Tibetan,Residual Network,Gated recurrent unit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要