Adapting pretrained speech model for Mandarin lyrics transcription and alignment.
CoRR(2023)
摘要
The tasks of automatic lyrics transcription and lyrics alignment have
witnessed significant performance improvements in the past few years. However,
most of the previous works only focus on English in which large-scale datasets
are available. In this paper, we address lyrics transcription and alignment of
polyphonic Mandarin pop music in a low-resource setting. To deal with the data
scarcity issue, we adapt pretrained Whisper model and fine-tune it on a
monophonic Mandarin singing dataset. With the use of data augmentation and
source separation model, results show that the proposed method achieves a
character error rate of less than 18% on a Mandarin polyphonic dataset for
lyrics transcription, and a mean absolute error of 0.071 seconds for lyrics
alignment. Our results demonstrate the potential of adapting a pretrained
speech model for lyrics transcription and alignment in low-resource scenarios.
更多查看译文
关键词
Automatic lyrics transcription,automatic lyrics alignment,data augmentation,model adaptation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要