Unidirectional Long Short-Term Memory Recurrent Neural Network With Recurrent Output Layer For Low-Latency Speech Synthesis

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2015)

引用 368|浏览123
暂无评分
摘要
Long short-term memory recurrent neural networks (LSTM-RNNs) have been applied to various speech applications including acoustic modeling for statistical parametric speech synthesis. One of the concerns for applying them to text-to-speech applications is its effect on latency. To address this concern, this paper proposes a low-latency, streaming speech synthesis architecture using unidirectional LSTM-RNNs with a recurrent output layer. The use of unidirectional RNN architecture allows frame-synchronous streaming inference of output acoustic features given input linguistic features. The recurrent output layer further encourages smooth transition between acoustic features at consecutive frames. Experimental results in subjective listening tests show that the proposed architecture can synthesize natural sounding speech without requiring utterance-level batch processing.
更多
查看译文
关键词
Statistical parametric speech synthesis,recurrent neural networks,long short-term memory,low-latency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要