Multi-Stream Spectral Representation For Statistical Parametric Speech Synthesis

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 0|浏览28
暂无评分
摘要
In statistical parametric speech synthesis such as Hidden Markov Model (HMM) based synthesis, one of the problems is in the over-smoothing of parameters, which leads to a muffled sensation in the synthesised output. In this paper, we propose an approach in which the high frequency spectrum is modelled separately from the low frequency spectrum. The high frequency band, which does not carry much linguistic information, is clustered using a very large decision tree so as to generate parameters as close as possible to natural speech samples. The boundary frequency can be adjusted at synthesis time for each state. Subjective listening tests show that the proposed approach is significantly preferred over the conventional approach using a single spectrum stream. Samples synthesised using the proposed approach sound less muffled and more natural.
更多
查看译文
关键词
HMM-based speech synthesis,sub-band,over-smoothing,factorised speech representation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要