Duwak: Dual Watermarks in Large Language Models
arxiv(2024)
摘要
As large language models (LLM) are increasingly used for text generation
tasks, it is critical to audit their usages, govern their applications, and
mitigate their potential harms. Existing watermark techniques are shown
effective in embedding single human-imperceptible and machine-detectable
patterns without significantly affecting generated text quality and semantics.
However, the efficiency in detecting watermarks, i.e., the minimum number of
tokens required to assert detection with significance and robustness against
post-editing, is still debatable. In this paper, we propose, Duwak, to
fundamentally enhance the efficiency and quality of watermarking by embedding
dual secret patterns in both token probability distribution and sampling
schemes. To mitigate expression degradation caused by biasing toward certain
tokens, we design a contrastive search to watermark the sampling scheme, which
minimizes the token repetition and enhances the diversity. We theoretically
explain the interdependency of the two watermarks within Duwak. We evaluate
Duwak extensively on Llama2 under various post-editing attacks, against four
state-of-the-art watermarking techniques and combinations of them. Our results
show that Duwak marked text achieves the highest watermarked text quality at
the lowest required token count for detection, up to 70
existing approaches, especially under post paraphrasing.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要