A Kullback-Leibler Divergence Based Recurrent Mixture Density Network for Acoustic Modeling in Emotional Statistical Parametric Speech Synthesis

MM '18: ACM Multimedia Conference Seoul Republic of Korea October, 2018(2018)

引用 0|浏览18
暂无评分
摘要
This paper proposes a Kullback-Leibler divergence (KLD) based recurrent mixture density network (RMDN) approach for acoustic modeling in emotional statistical parametric speech synthesis (SPSS), which aims at improving model accuracy and emotion naturalness. First, to improve model accuracy, we propose to use RMDN as acoustic model, which combines an LSTM with a mixture density network (MDN). Adding mixture density layer allows us to do multimodal regression as well as to predict variances, thus modeling more accurate probability density functions of acoustic features. Second, we further introduce Kullback-Leibler divergence regularization in model training. Inspired by KLD's success in acoustic model adaptation, we aim to improve the emotion naturalness by maximizing the distances between the distributions of emotional speech and neutral speech. Objective and subjective evaluations show that the proposed approach improves the prediction accuracy of acoustic features and the naturalness of the synthesized emotional speech.
更多
查看译文
关键词
Emotional statistical parametric speech synthesis, recurrent mixture density network, LSTM, KLD-RMDN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要