Chrome Extension
WeChat Mini Program
Use on ChatGLM

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.

ICLR (Poster)(2018)

Cited 537|Views489
No score
Abstract
We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. We scale Deep Voice 3 to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, we identify common error modes of attention-based speech synthesis networks, demonstrate how to mitigate them, and compare several different waveform synthesis methods. We also describe how to scale inference to ten million queries per day on one single-GPU server.
More
Translated text
Key words
voice,convolutional sequence,learning,text-to-speech
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined