谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Membership Inference Attacks With Token-Level Deduplication on Korean Language Models.

IEEE Access(2023)

引用 0|浏览8
暂无评分
摘要
The confidentiality threat against training data has become a significant security problem in neural language models. Recent studies have shown that memorized training data can be extracted by injecting well-chosen prompts into generative language models. While these attacks have achieved remarkable success in the English-based Transformer architecture, it is unclear whether they are still effective in other language domains. This paper studies the effectiveness of attacks against Korean models and the potential for attack improvements that might be beneficial for future defense studies. The contribution of this study is two-fold. First, we perform a membership inference attack against the state-of-the-art Korean GPT model. We found approximate training data with 20% to 90% precision in the top-100 samples and confirmed that the proposed attack technique for naive GPT is valid across the language domains. Second, in this process, we observed that the redundancy of the selected sentences could hardly be detected with the existing attack method. Since the information appearing in a few documents is more likely to be meaningful, it is desirable to increase the uniqueness of the sentences to improve the effectiveness of the attack. Thus, we propose a deduplication strategy to replace the traditional word-level similarity metric with the BPE token level. Our proposed strategy reduces 6% to 22% of the underestimated samples among selected ones, thereby improving precision by up to 7%p. As a result, we show that considering both language- and model-specific characteristics is essential to improve the effectiveness of attack strategies. We also discuss possible mitigations against the MI attacks on the general language models.
更多
查看译文
关键词
Training data,Deep learning,Measurement,Natural language processing,Data models,Entropy,Data mining,Confidentiality,deep learning,generative language model,Korean-based GPT,membership inference,training data extraction attack
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要