Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion
CoRR(2024)
Abstract
With the rise of Machine Learning as a Service (MLaaS) platforms,safeguarding
the intellectual property of deep learning models is becoming paramount. Among
various protective measures, trigger set watermarking has emerged as a flexible
and effective strategy for preventing unauthorized model distribution. However,
this paper identifies an inherent flaw in the current paradigm of trigger set
watermarking: evasion adversaries can readily exploit the shortcuts created by
models memorizing watermark samples that deviate from the main task
distribution, significantly impairing their generalization in adversarial
settings. To counteract this, we leverage diffusion models to synthesize
unrestricted adversarial examples as trigger sets. By learning the model to
accurately recognize them, unique watermark behaviors are promoted through
knowledge injection rather than error memorization, thus avoiding exploitable
shortcuts. Furthermore, we uncover that the resistance of current trigger set
watermarking against removal attacks primarily relies on significantly damaging
the decision boundaries during embedding, intertwining unremovability with
adverse impacts. By optimizing the knowledge transfer properties of protected
models, our approach conveys watermark behaviors to extraction surrogates
without aggressively decision boundary perturbation. Experimental results on
CIFAR-10/100 and Imagenette datasets demonstrate the effectiveness of our
method, showing not only improved robustness against evasion adversaries but
also superior resistance to watermark removal attacks compared to
state-of-the-art solutions.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined