A Template Based Voice Trigger System Using Bhattacharyya Edit Distance

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5(2011)

引用 25|浏览1
暂无评分
摘要
Dynamic Time Warping (DTW) is frequently used in isolated word recognition system due to their simplicity and robustness to noise. However, the computational effort required by DTW based solution is proportional to the number of words registered in the system. Vector Quantization (VQ) is employed to alleviate this by converting the spoken input to a sequence of discrete symbols to be matched with the stored word template. In this paper, we propose the use of Bhattacharyya distance as the cost function for this pattern matching problem. The template used is a string of discrete symbols, each modeled by Gaussian Mixture Model (GMM) representing context dependent sub-word unit. The system is tested on 100 template matching task from two registrations of 50 cable TV channel names to simulate voice-triggered remote control. An average of 92% accuracy is obtained. A scheme is also proposed to enable guest user without registration data to use the system efficiently.
更多
查看译文
关键词
dynamic time warping, template matching, edit distance, isolated word recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要