Speech prosody generation for text-to-speech synthesis based on generative model of F0 contours.

INTERSPEECH(2014)

引用 3|浏览4
暂无评分
摘要
This paper deals with the problem of generating the fundamental frequency (F0) contour of speech from a text input for text-to-speech synthesis. We have previously introduced a statistical model describing the generating process of speech F0 contours, based on the discrete-time version of the Fujisaki model. One remarkable feature of this model is that it has allowed us to derive an efficient algorithm based on powerful statistical methods for estimating the Fujisaki-model parameters from raw F0 contours. To associate a sequence of the Fujisakimodel parameters with a text input based on statistical learning, this paper proposes extending this model to a context-dependent one. We further propose a parameter training algorithm for the present model based on a decision tree-based context clustering. Index Terms: Speech F0 contours, stochastic model, Fujisaki model, hidden Markov model, EM algorithm
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要