NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human
arxiv(2024)
Abstract
Increasing concerns about privacy leakage issues in academia and industry
arise when employing NLP models from third-party providers to process sensitive
texts. To protect privacy before sending sensitive data to those models, we
suggest sanitizing sensitive text using two common strategies used by humans:
i) deleting sensitive expressions, and ii) obscuring sensitive details by
abstracting them. To explore the issues and develop a tool for text rewriting,
we curate the first corpus, coined NAP^2, through both crowdsourcing and the
use of large language models (LLMs). Compared to the prior works based on
differential privacy, which lead to a sharp drop in information utility and
unnatural texts, the human-inspired approaches result in more natural rewrites
and offer an improved balance between privacy protection and data utility, as
demonstrated by our extensive experiments.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined