A Strategy for Improved Phone-Level Lyrics-to-Audio Alignment for Speech-to-Singing Synthesis

INTERSPEECH(2019)

引用 3|浏览15
暂无评分
摘要
Speech-to-Singing refers to techniques that transform speech to a singing voice. A major performance factor of this process relies on the precision to align the phonetic sequence of the input speech to the timing of the target singing. Unfortunately, the precision of existing techniques for phone-level lyrics-to-audio alignment has been found insufficient for this task. We propose a complete pipeline for automatic phone-level lyrics-to-audio alignment based on an HMM-based forced-aligner and singing acoustics normalization. The system obtains phone-level precision in the range of a few tens of milliseconds as we report in the objective evaluation. The subjective evaluation reveals that the smoothness of the singing voice generated with the proposed methodology was found close to the one obtained using manual alignments.
更多
查看译文
关键词
lyrics alignment, singing synthesis, text-to-speech, automatic speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要