Why the definition of Personal Data Could Break AI Compliance

Share This Article

The relationship between AI, personal data, and the GDPR is under intense scrutiny — and an upcoming judgment from the Court of Justice of the European Union (CJEU) could redefine how businesses approach compliance with substantial positive or negative effects, depending on the outcome.

In early September 2025, the Court will rule in the Single Resolution Board v. EDPS case, a decision that could reshape the legal meaning of personal data and its application to AI governance.

Moving Beyond the “Anything Identifiable” Standard

Under rigid and not realistic (in my view) interpretations, almost any data that could theoretically be linked to an individual is considered personal data under the GDPR. This broad approach does not reflect the actual circumstances and creates practical challenges for AI systems, particularly General Purpose AI (GPAI) models trained on vast, unstructured datasets.

A growing body of legal reasoning — supported by Recital 26 of the GDPR and past CJEU judgments (Breyer, Scania, IAB Europe) — suggests a more targeted test:

Personal data should be assessed in light of whether the specific controller or processor has lawful and realistic means to identify an individual.
If a recipient lacks the technical or legal capability to perform identification, the data may not be considered personal from their perspective, even if the original controller could.

This relative identifiability approach has been endorsed by the Advocate General in the SRB case, who noted that pseudonymized data can fall outside the scope of the GDPR for recipients when re-identification is virtually impossible.

Why This Matters for AI and Personal Data

Large-scale AI training frequently involves data scraped from the web — text, images, and other content that may refer to real people. Under the “absolute” approach, any trace of such information would trigger GDPR compliance obligations, including rights of access, correction, and deletion. For AI providers, this can become an unmanageable task.

By contrast, a relative approach recognises that AI developers may work with data they cannot, in practice, link to a living individual. If they can show that they lack the ability to identify anyone and have implemented safeguards to prevent re-identification, the training data may not be classified as personal data in the first place.

The Memorisation Question in AI Models

One of the more contentious privacy issues for AI is memorisation — the fact that certain models, like large language models, can sometimes reproduce snippets from their training data. Critics argue that this proves the model “contains” personal data. Advocates for the relative view disagree, pointing out that the key test under the GDPR should be whether the AI provider can identify the person behind that data.

This doesn’t mean dismissing risks such as malicious attempts to extract information. Instead, it means acknowledging that AI outputs are generated through probabilistic pattern matching, not as direct retrievals from a structured personal data database.

Lessons from the SRB Case for AI Compliance

In the SRB case, data was heavily pseudonymized and aggregated before being shared. The recipient only saw grouped comments linked to random identifiers, without access to any keys for re-identification. The Advocate General found that this created a scenario where identifying an individual was virtually impossible.

For AI compliance, the same principles apply:

Apply pseudonymization to training data before ingestion.
Aggregate and transform identifiers so they cannot be reversed.
Ensure internal systems lack any functionality to re-link outputs to identifiable individuals.

These measures can help position AI providers within a compliance framework where certain datasets may fall outside the scope of personal data under the GDPR.

Balancing Privacy and Innovation in AI

Adopting a relative definition of personal data does not weaken privacy standards. Instead, it focuses GDPR protections where they are most needed and avoids applying burdensome rules where there is no realistic privacy risk.

Under such a system:

Controllers who can realistically identify individuals remain fully bound by the GDPR.
Recipients with no means of identification may have lighter compliance obligations, while still adhering to ethical and contractual safeguards.

What Comes Next for AI and GDPR Compliance

If the CJEU endorses the relative approach, it could create a more workable balance between protecting individuals and enabling innovation in AI. AI developers would still need to assess risks and implement safeguards, but they could do so in line with practical identifiability rather than hypothetical scenarios.

This shift would be particularly significant for GPAI models, where training datasets often include information that is technically impossible for the developer to link to a specific person. By aligning the legal definition of personal data with the realities of AI development, Europe could maintain strong privacy protections while avoiding unnecessary restrictions on innovation.

The upcoming ruling has the potential to be a turning point — not just for data protection law, but for how AI and personal data coexist under the GDPR in the years to come.

On the same topic, you can read our AI Law Journal Diritto Intelligente and the articles and materials available HERE.

(Visited 181 times, 1 visits today)

Why the definition of “Personal Data” Could Make or Break AI Compliance