1. Estimation of cost of k–anonymity in the number of dummy records
- Author
-
Ito, Satoshi and Kikuchi, Hiroaki
- Abstract
De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information. k-anonymization, which processes data so that at least kusers have the same records, is one of the representative methods of de-identification. One of the methods of k-anonymization is adding dummy records into the data to protect users who have unique histories. For this method, the cost for k-anonymization is the difference in the number of records between the original data and the processed data, and it can be calculated only after deciding the parameter kand processing data. However, we want to calculate the cost before processing and find the optimal value of kbecause processing the big data with various kis very costly. In this paper, we propose a new model of transaction data that gives us a probability distribution and an expected value of values in data under the assumption that all values occur independently with uniform probability. Applying our data model, it is possible to evaluate the cost of k-anonymized data even before processing.
- Published
- 2023
- Full Text
- View/download PDF