🗼CNIL - Reuse of Databases
A necessary look at compliance checks
The National Commission on Informatics and Liberties (CNIL) has published an important guide1 detailing the necessary checks to ensure legal compliance when reusing databases. This topic isn’t new, I think I might have even mentioned it in a previous note, but the rapid evolution of technology we’re witnessing today and the growing use of data for AI and scientific research make it more relevant than ever.
As a professional in this field, balancing innovation with regulatory compliance is always a challenge, especially when dealing with our alphabet soup: GDPR, PIA, LIA, TIA, AI-ACT, SORA, DORA…
The CNIL, or Commission Nationale de l'Informatique et des Libertés (National Commission on Informatics and Liberties), is the French authority responsible for protecting personal data and ensuring privacy rights in digital environments. Established in 1978 with the approval of the French Data Protection Act ("Loi Informatique et Libertés"), the CNIL plays a significant role in the European landscape, especially with the implementation and enforcement of the GDPR.
The document2 emphasizes the importance of verifying whether the creation or sharing of a database is legally acceptable. This includes confirming that the data was not obtained illicitly, such as through leaks or theft. This requirement is not exclusive to European regulations; in Brazil, the LGPD also reinforces that the use of data must have a clear legal basis, and reusing information from questionable sources can lead to sanctions for the company. Data controllers are responsible for ensuring that every operation complies with the requirements of the GDPR and other applicable regulations.
How can we determine if a specific dataset contains data collected illegally or not?
The concept of "manifestly illegal data" was well addressed in the text. A classic example would be the use of a database acquired on the dark web, something that directly violates both the GDPR and criminal laws, such as the French Penal Code, with equivalents in other jurisdictions that also enforce privacy and data protection laws.
However, the responsibility for reuse should not be underestimated, even in less obvious cases. CNIL's guidance suggests the need for due diligence, especially when there is uncertainty about the origin or compliance of the dataset. This caution mirrors what is expected in data protection impact assessments (DPIAs), which help identify and mitigate potential risks.
Companies working with large data volumes, whether for database enrichment or training LLMs, must ensure the data's origin is sound. If in a privacy program we embrace "Privacy by Design," does an AI governance program lead us to adopt "Data by Design"?
Another interesting point is the recommendation to incorporate compliance checks into contracts with the original holders of databases, typically third-party operators. Beyond providing legal protection for both parties, this practice enhances transparency and trust in data processing. However, it does not eliminate the need for continuous checks on the compatibility between the original purpose of data collection and the intended new uses. This is a common challenge in projects involving secondary data and underscores the importance of proper governance.
For instance, if data is collected for a specific purpose linked to a legal basis, what happens if a new purpose emerges? Continuously revisiting the program and updating data mapping becomes essential for the safe use of data, especially when the data is publicly accessible (originally published for a specific purpose) or shared by third parties under similar constraints.
The reuse of databases offers opportunities but also presents technical and ethical challenges. CNIL’s recommendations serve as a reminder that while innovation should be encouraged, it cannot come at the expense of privacy and data security.
I believe these guidelines can be applied to any organization, regardless of size or location, as an essential preventive measure. After all, the cost of ignoring these issues is likely far greater than the effort of implementing compliance practices. Whether we're in Europe, Brazil, the United Stat.. well, let’s save that part for another chapter.
https://www.cnil.fr/fr/reutilisation-de-bases-de-donnees-les-verifications-necessaires-pour-respecter-la-loi
https://drive.google.com/file/d/1Dvzq_mt54js9BZlFIRv-5G9ik2u6p3Ud/view?usp=sharing