Automatic Anonymization of Forms
The City Council of Galapagar, in Madrid, assigns the task to two of its administrators of anonymizing personal data such as names, surnames, National Identification Numbers (DNIs), addresses and telephone numbers in PDF documents. This process is essential to comply with data protection laws.
At WhiteBox, we quickly identified the opportunity to fully automate this process. We use next-generation language models (LLMs) that detect and process document fragments that contain sensitive information. We combine this with an efficient PDF file management system, replacing personal data with anonymizing marks, while the rest of the document remain unchanged. We implemented open source tools for manipulating PDFs and developing with LLMs, and using the Langchain framework. The result was the development of a web tool that integrated the models responsible for the anonymization task, which allowing the upload of PDF files returned them anonymized within a few minutes.
Automation has freed up valuable time for the administrative staff of Galapagar City Council, allowing them to focus on improving bureaucratic efficiency and citizen care. This project positions the city council among the pioneering local administrations, highlighting its commitment to innovation and technological advancement.