DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding

2019 International Conference on Electronics, Information, and Communication (ICEIC)(2019)

Cited 1|Views7
No score
Abstract
In this paper, multi speaker speech synthesis using speaker embedding is proposed. The proposed model is based on Tacotron network, but post-processing network of the model is modified with dilated convolution layers, which used in Wavenet architecture, to make it more adaptive to speech. The model can generate multi speaker voice with only one neural network model by giving auxiliary input data, speaker embedding, to the network. This model shows successful result for generating two speaker's voices without significant deterioration of speech quality.
More
Translated text
Key words
Hidden Markov models,Speech synthesis,Data models,Synthesizers,Convolution,Decoding
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined