A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models
arxiv(2024)
摘要
Copyright law confers upon creators the exclusive rights to reproduce,
distribute, and monetize their creative works. However, recent progress in
text-to-image generation has introduced formidable challenges to copyright
enforcement. These technologies enable the unauthorized learning and
replication of copyrighted content, artistic creations, and likenesses, leading
to the proliferation of unregulated content. Notably, models like stable
diffusion, which excel in text-to-image synthesis, heighten the risk of
copyright infringement and unauthorized distribution.Machine unlearning, which
seeks to eradicate the influence of specific data or concepts from machine
learning models, emerges as a promising solution by eliminating the
copyright memories ingrained in diffusion models. Yet, the absence of
comprehensive large-scale datasets and standardized benchmarks for evaluating
the efficacy of unlearning techniques in the copyright protection scenarios
impedes the development of more effective unlearning methods. To address this
gap, we introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion
models to curate a dataset. This dataset encompasses anchor images, associated
prompts, and images synthesized by text-to-image models. Additionally, we have
developed a mixed metric based on semantic and style information, validated
through both human and artist assessments, to gauge the effectiveness of
unlearning approaches. Our dataset, benchmark library, and evaluation metrics
will be made publicly available to foster future research and practical
applications (https://rmpku.github.io/CPDM-page/, website /
http://149.104.22.83/unlearning.tar.gz, dataset).
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要