A Diphone Sharing Method Towards Scalable Unit-training-based TTS

mag(2013)

Cited 23|Views7
No score
Abstract
One of the most popular applications of Text to Speech (TTS) is in embedded devices. The resource limitation of embedded device requires the footprint of TTS system to be very small. Toshiba TTS for embedded device is a unit-training-based system and uses diphone as basic unit. The trained diphone inventory occupies a large part of the footprint. This paper proposes a diphone sharing method to reduce the size of trained diphone inventory. We use the phonetic knowledge of Mandarin to cluster the vowels so that the vowels in the same cluster can be shared. We also propose a method to automatically judge whether a shared diphone is good or not and only those judged as good are used in the inventory. By the diphone-sharing method, the size of trained diphone inventory is reduced by 40% while the subjective evaluation shows that the performance keeps almost the same with that of unshared diphone inventory.
More
Translated text
Key words
embedded system
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined