Enhancing Note-Level Singing Transcription Model with Unlabeled and Weakly Labeled Data

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
Note-level automatic singing transcription, involving the extraction of onset, offset, and pitch information from a singing voice, is a crucial process in the field of Music Information Retrieval (MIR), The recent advancements in deep learning models have led to significant progress in this field. However, annotating a training dataset requires professional music expertise, and the entire annotation process is time-consuming and labor-intensive. Therefore, this field suffers from a severe data scarcity problem. To address this issue, we developed a singing transcription model based on wav2vec 2.0, a pretrained speech representation model. The model can learn from unlabeled speech and weakly-labeled singing data and use this knowledge to benefit the transcription task. The experiments showed that our proposed method achieves a significant improvement over previous approaches on various benchmarks. Moreover, additional experiments demonstrate that our method achieves competitive performance even with a small proportion of training data.
更多
查看译文
关键词
singing transcription,wav2vec 2.0,fine-tuning,pretrained models,music information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要