TMI! Finetuned Models Leak Private Information from their Pretraining Data
arxiv(2023)
摘要
Transfer learning has become an increasingly popular technique in machine
learning as a way to leverage a pretrained model trained for one task to assist
with building a finetuned model for a related task. This paradigm has been
especially popular for $\textit{privacy}$ in machine learning, where the
pretrained model is considered public, and only the data for finetuning is
considered sensitive. However, there are reasons to believe that the data used
for pretraining is still sensitive, making it essential to understand how much
information the finetuned model leaks about the pretraining data. In this work
we propose a new membership-inference threat model where the adversary only has
access to the finetuned model and would like to infer the membership of the
pretraining data. To realize this threat model, we implement a novel
metaclassifier-based attack, $\textbf{TMI}$, that leverages the influence of
memorized pretraining samples on predictions in the downstream task. We
evaluate $\textbf{TMI}$ on both vision and natural language tasks across
multiple transfer learning settings, including finetuning with differential
privacy. Through our evaluation, we find that $\textbf{TMI}$ can successfully
infer membership of pretraining examples using query access to the finetuned
model. An open-source implementation of $\textbf{TMI}$ can be found
$\href{https://github.com/johnmath/tmi-pets24}{\text{on GitHub}}$.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要