1. Detection & Anonymization of Sensitive Information in Text : AI-Driven Solution for Anonymization
- Author
-
Böhlin, Felix and Böhlin, Felix
- Abstract
This report addresses the area of data privacy, with a particular focus on detecting and anonymizing personally identifiable information (PII) in unstructured text. As digital data accumulation increases and the use of large datasets propels AI advancements, it has become crucial to protect sensitive information from closed-source programs, whose data handling practices remain a mystery. This paper explores efficient methods for identifying and anonymizing sensitive data, a key component in maintaining privacy and following the General Data Protection Regulation (GDPR). The study evaluates existing large language models (LLMs) and ways to prompt these models for specific purposes. It assesses their performance in accurately identifying various types of Personal Identifiable Information (PII) and replacing them with dummy data, ensuring the utility of the text is maintained. This contribution is significant in providing a solution adaptable to the Swedish language and format. Thereby helping organizations in effectively manage their own and their customer’s data.
- Published
- 2024