Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Heiga Zen,Norbert Braunschweiler,Sabine Buchholz,Mark J. F. Gales,Kate Knill,Sacha Krstulovic,Javier Latorre

IEEE Transactions on Audio, Speech, and Language Processing（2012）

引用 118|浏览77

暂无评分

摘要

n increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language.

查看译文

关键词

decision trees,matrix decomposition,hidden markov model,speech synthesis,interpolation,regression analysis,hidden markov models,decision tree,speech recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要