A video indexing and retrieval computational prototype based on transcribed speech

MULTIMEDIA TOOLS AND APPLICATIONS(2021)

引用 5|浏览13
暂无评分
摘要
Using the voice to interact with systems is attractive in medicine and other areas due to its friendliness and flexibility. Video indexing and retrieval have benefited from this resource. However, few initiatives use speech recognition to support both tasks. This work aims to develop and evaluate a prototype system to index and retrieve videos from speech transcription. In particular, the user can narrate each video’s content, generating the utterance that is captured, transformed into text and timestamped by the computational system. Simple text processing techniques are then applied to the obtained transcript before indexing. Afterward, the user can also query by speech or text to find relevant videos previously indexed. We conducted an experimental evaluation of the prototype in sets of 50 and 10 public videos. As part of this process, one collaborator manually narrated the 50 videos, while four others narrated a subset of 13 videos. An automatic narration scheme was also applied to this subset and the set of 10 videos. The evaluation showed promising results regarding Brazilian Portuguese speech recognition and retrieval performance. For example, the average word error rate reached down to 0.03 and the mean average precision achieved up to 1.00. Besides performing well, the computational tool is flexible since few changes are required to support other languages.
更多
查看译文
关键词
Computational web system, Google web speech, Speech to text, TF-IDF, Video retrieval, Vitrivr
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要