Automatic generation of the complete vocal tract shape from the sequence of phonemes to be articulated

Speech Communication(2022)

引用 5|浏览2
暂无评分
摘要
Articulatory speech synthesis requires generating realistic vocal tract shapes from the sequence of phonemes to be articulated. This work proposes the first model trained from rt-MRI films to automatically predict all of the vocal tract articulators’ contours. The data are the contours tracked in the rt-MRI database recorded for one speaker. Those contours were exploited to train an encoder–decoder network to map the sequence of phonemes and their durations to the exact gestures performed by the speaker. Different from other works, all the individual articulator contours are predicted separately, allowing the investigation of their interactions. We measure four tract variables closely coupled with critical articulators and observe their variations over time. The test demonstrates that our model can produce high-quality shapes of the complete vocal tract with a good correlation between the predicted and the target variables observed in rt-MRI films, even though the tract variables are not included in the optimization procedure.
更多
查看译文
关键词
Phonetic-to-articulatory,Speech production,Vocal tract shape
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要