Start Over

Reidentification Risk in Panel Data: Protecting for k-Anonymity.

Authors :: Li, Shaobo
Schneider, Matthew J.
Yu, Yan
Gupta, Sachin
Source :: Information Systems Research; Sep2023, Vol. 34 Issue 3, p1066-1088, 23p
Publication Year :: 2023
Abstract: Market research companies collect extensive data on purchasing, travel, and app and media usage behaviors of consumers, prescriptions written by physicians, and so forth. Although the companies provide assurances of anonymity to the study participants, there is a significant concern about the vulnerability of these data. Could a motivated intruder match the pattern of purchases with the name and other personal and potentially sensitive details of an individual? We find that 17% to 94% of market research panelists in 15 frequently bought consumer goods categories are subject to high risk of reidentification through a potential record linkage attack based on their unique purchasing histories even when their identities are anonymized. We also demonstrate that the risk of reidentification in such data are vastly understated by the conventional measure, unicity, and propose a new measure, termed "sno-unicity." To protect the privacy of panelists, we consider the well-known privacy notion of k-anonymity and develop a new approach called "graph-based minimum movement k-anonymization" that is designed especially for retaining the usefulness of panel data. We show that our approach works well in protecting participants' privacy without substantially altering the information that marketers need for sound marketing decisions. We consider the risk of reidentification of panelists in marketing research data that are widely used to obtain insights into buyer behavior and to develop marketing strategy. We find that 17%–94% of the panelists in 15 frequently bought consumer goods categories are subject to high risk of reidentification through a potential record linkage attack based on their unique purchasing histories even when their identities are anonymized. We first demonstrate that the risk of reidentification is vastly understated by unicity, the conventional measure. Instead, we propose a new measure of reidentification risk, termed sno-unicity, which accounts for the longitudinal nature of panel data, and show that it is much larger than unicity. To protect the privacy of panelists, we consider the well-known privacy notion of k-anonymity and develop a new approach called graph-based minimum movement k-anonymization (k-MM) that is designed especially for panel data. The proposed k-MM approach can be formulated as an optimization problem in which the objective is to minimally distort variables in the original data based on weights that users prespecify corresponding to their use case. We further show how our approach can be extended to achieve l-diversity. We apply the k-MM approach to two different panel data sets that are widely used in marketing research. To achieve a given privacy level, compared with several benchmark protection methods, the protected data from our method result in the least distortion in inferences about key marketing metrics, such as brand market shares, share of category requirements, brand switching rates, and marketing-mix parameters estimated from a hierarchical Bayesian brand choice model. History: Param Singh, Senior Editor and Associate Editor. Supplemental Material: The online appendix is available at https://doi.org/10.1287/isre.2022.1169. [ABSTRACT FROM AUTHOR]