Prediction Of Pronunciation Variations For Speech Synthesis: A Data-Driven Approach

2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING(2005)

引用 36|浏览7
暂无评分
摘要
The fact that speakers vary pronunciations of the same word within their own speech is well known, but little has been done to automatically categorize and predict a speaker's pronunciation distribution for unit selection speech synthesis. Recent work demonstrated how to automatically identify a speaker's choice between full and reduced pronunciations using acoustic modeling techniques from speech recognition. Here, we extend this approach and show flow its results can be used to predict a speaker's choice of pronunciations for synthesis. We apply machine learning techniques to the automatically categorized data to produce a pronunciation variation prediction model given only the utterance text - allowing the system to synthesize novel phrases with variations like those the speaker would make. Empirical studies emphasize that we can improve automatic pronunciation labels and successfully utilize the results for prediction of future synthesized examples. The prediction results based on these automatic labels are very similar to those trained from human labeled data - allowing us to reduce manual effort while still achieving comparable results.
更多
查看译文
关键词
speech recognition,prediction model,databases,loudspeakers,predictive models,empirical study,acoustics,learning artificial intelligence,categorical data,speech synthesis,machine learning,natural languages,neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要