Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
arxiv(2024)
摘要
Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio
quality and naturalness, yet they lack the capability to control the style
attributes of the synthesized singing explicitly. We propose Prompt-Singer, the
first SVS method that enables attribute controlling on singer gender, vocal
range and volume with natural language. We adopt a model architecture based on
a decoder-only transformer with a multi-scale hierarchy, and design a
range-melody decoupled pitch representation that enables text-conditioned vocal
range control while keeping melodic accuracy. Furthermore, we explore various
experiment settings, including different types of text representations, text
encoder fine-tuning, and introducing speech data to alleviate data scarcity,
aiming to facilitate further research. Experiments show that our model achieves
favorable controlling ability and audio quality. Audio samples are available at
http://prompt-singer.github.io .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要