Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
CoRR(2024)
摘要
In recent years, diffusion-based text-to-music (TTM) generation has gained
prominence, offering a novel approach to synthesizing musical content from
textual descriptions. Achieving high accuracy and diversity in this generation
process requires extensive, high-quality data, which often constitutes only a
fraction of available datasets. Within open-source datasets, the prevalence of
issues like mislabeling, weak labeling, unlabeled data, and low-quality music
waveform significantly hampers the development of music generation models. To
overcome these challenges, we introduce a novel quality-aware masked diffusion
transformer (QA-MDT) approach that enables generative models to discern the
quality of input music waveform during training. Building on the unique
properties of musical signals, we have adapted and implemented a MDT model for
TTM task, while further unveiling its distinct capacity for quality control.
Moreover, we address the issue of low-quality captions with a caption
refinement data processing approach. Our demo page is shown in
https://qa-mdt.github.io/. Code on https://github.com/ivcylc/qa-mdt
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要