WavCraft: Audio Editing and Generation with Large Language Models
arxiv(2024)
摘要
We introduce WavCraft, a collective system that leverages large language
models (LLMs) to connect diverse task-specific models for audio content
creation and editing. Specifically, WavCraft describes the content of raw audio
materials in natural language and prompts the LLM conditioned on audio
descriptions and user requests. WavCraft leverages the in-context learning
ability of the LLM to decomposes users' instructions into several tasks and
tackle each task collaboratively with the particular module. Through task
decomposition along with a set of task-specific models, WavCraft follows the
input instruction to create or edit audio content with more details and
rationales, facilitating user control. In addition, WavCraft is able to
cooperate with users via dialogue interaction and even produce the audio
content without explicit user commands. Experiments demonstrate that WavCraft
yields a better performance than existing methods, especially when adjusting
the local regions of audio clips. Moreover, WavCraft can follow complex
instructions to edit and create audio content on the top of input recordings,
facilitating audio producers in a broader range of applications. Our
implementation and demos are available at this
https://github.com/JinhuaLiang/WavCraft.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要