NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Authors :: Huang, Shuo
MacLean, William
Kang, Xiaoxi
Wu, Anqi
Qu, Lizhen
Xu, Qiongkai
Li, Zhuang
Yuan, Xingliang
Haffari, Gholamreza
Publication Year :: 2024
Abstract: Increasing concerns about privacy leakage issues in academia and industry arise when employing NLP models from third-party providers to process sensitive texts. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined NAP^2, through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works based on differential privacy, which lead to a sharp drop in information utility and unnatural texts, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments.

Tools