Chrome Extension
WeChat Mini Program
Use on ChatGLM

Corpus Design Using Convolutional Auto-Encoder Embeddings for Audio-Book Synthesis

INTERSPEECH(2019)

Cited 7|Views15
No score
Abstract
In this study, we propose an approach for script selection in order to design TTS speech corpora. A Deep Convolutional Neural Network (DCNN) is used to project linguistic information to an embedding space. The embedded representation of the corpus is then fed to a selection process to extract a subset of utterances which offers a good linguistic coverage while tending to limit the linguistic unit repetition. We present two selection processes: a clustering approach based on utterance distance and another method that tends to reach a target distribution of linguistic events. We compare the synthetic signal quality of the proposed methods to state of art methods objectively and subjectively. The subjective and objective measures confirm the performance of the proposed methods in order to design speech corpora with better synthetic speech quality. The perceptual test shows that our TTS global cost can be used as an alternative to synthetic overall quality.
More
Translated text
Key words
corpus design, deep neural networks, embedding space, clustering, text-to-speech synthesis
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined