谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Two-Stage Sequence-to-Sequence Neural Voice Conversion with Low-to-High Definition Spectrogram Mapping

intelligent information hiding and multimedia signal processing(2018)

引用 2|浏览6
暂无评分
摘要
In this study, we propose a voice conversion technique with two-stage conversion, which is realized by using two models consisting of U-Net and pix2pix. Using U-Net, we tried to reproduce intonation of a target speaker by performing low-dimensional feature conversion considering the time direction. We introduced pix2pix for the task of spectrogram enhancement. The pix2pix is trained to map from low definition spectrogram to high definition spectrogram (low-to-high spectrogram mapping). Low definition spectrogram is reconstructed from low dimensional mel-cepstrum converted by U-Net and high definition spectrogram is extracted from natural speech. In objective evaluations, we showed that the proposed method was effective in improvement of mel-cepstral distance (MCD) and Log F0 RMSE. Subjective evaluations revealed that the use of the proposed method had a certain effect in improving speech individuality while maintaining the same level of naturalness as the conventional method.
更多
查看译文
关键词
DNN-based voice conversion, U-Net, Pix2pix, CNN, Two-stage conversion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要