Fine-grained style modelling and transfer in text-to-speech synthesis via content-style disentanglement.

CoRR(2020)

引用 0|浏览1
暂无评分
摘要
This paper presents a novel neural model for fine-grained style modeling and transfer in expressive text-to-speech (TTS) synthesis. By applying collaborative learning and adversarial learning strategies with thoughtfully designed loss functions, the proposed model is able to perform effective phoneme-level disentanglement of content factor and style factor of speech. Speech style transfer can be achieved by combining the style embedding extracted from a reference utterance with the phoneme embedding derived from the source text. Results of objective evaluation show that the synthesized speech preserves the intended content and carries similar prosody to the reference speech. Results of subjective evaluation show that the new model performs better than other fine-grained style transfer TTS models.
更多
查看译文
关键词
style modelling,synthesis,fine-grained,text-to-speech,content-style
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要