Share This Article
The definition of personal data remains unchanged under the Digital Omnibus. What this means for AI training and legitimate interest under GDPR.
The Definition of Personal Data and AI Training: Why the Digital Omnibus Matters
The definition of personal data and AI training remain tightly connected after the latest developments on the Digital Omnibus. According to recent information from Brussels, the European legislator does not intend to modify or narrow the scope of “personal data” under the GDPR.
At first glance, this may seem like a technical confirmation. However, for AI developers and companies deploying large-scale AI systems, the impact is significant.
By keeping the definition of personal data unchanged, the European framework preserves the broad interpretative perimeter that already applies to AI training activities. As a result, legal uncertainty remains.
Why the Definition of Personal Data Directly Affects AI Training
Under Article 4(1) GDPR, personal data includes any information relating to an identified or identifiable natural person. European case law has consistently interpreted identifiability broadly. Even indirect or contextual identifiers may be sufficient.
For AI training, this is critical.
Training datasets often include:
-
Publicly available web content
-
Text corpora containing embedded identifiers
-
Metadata and contextual information
-
Scraped online material
Even where direct identifiers are removed, the risk of re-identification or inference may prevent datasets from qualifying as anonymous. Consequently, much AI training continues to fall within the GDPR framework.
Because the Digital Omnibus does not adjust the definition, the existing broad approach remains the benchmark.
Legitimate Interest and AI Training: Uncertainty Persists
The relationship between legitimate interest and AI training remains one of the most debated issues in European data protection law.
Article 6(1)(f) GDPR allows processing based on legitimate interest, provided that:
-
A legitimate interest is clearly identified
-
The processing is necessary
-
The balancing test favors the controller over the data subject
In theory, AI innovation and technological development may constitute legitimate interests. In practice, the balancing test creates structural challenges.
How can organizations conduct a meaningful balancing assessment at web scale?
How can transparency be ensured when datasets are sourced from diffuse public environments?
How can objection rights be operationalized in large training datasets?
The Digital Omnibus was seen as a potential opportunity to clarify this issue. However, the proposed changes do not materially strengthen or codify the ability to rely on legitimate interest for AI training.
Therefore, the legal feasibility of this approach remains uncertain and context-specific. Enforcement risk remains real.
Anonymisation Thresholds Remain High
Another important consequence of maintaining the existing definition is that the threshold for anonymisation remains unchanged.
True anonymisation requires irreversibility, considering all means reasonably likely to be used. In an AI context, this assessment is complex. Advanced models may generate outputs that reintroduce personal elements or allow indirect identification.
As a result, many technically transformed datasets will still qualify as personal data. The distinction between anonymisation and pseudonymisation remains decisive.
The Digital Omnibus does not lower that threshold.
Regulatory Fragmentation Across the EU
Because the legislative text remains stable, interpretation will continue to depend on supervisory authorities and courts.
We have already observed differences across Member States regarding scraping practices, transparency obligations, and proportionality assessments. Without further harmonization, divergent enforcement approaches remain possible.
For multinational companies, this means:
-
Compliance asymmetries
-
Increased litigation exposure
-
Strategic uncertainty in AI deployment
The definition of personal data and AI training will therefore continue to be shaped not only by legislation but also by enforcement practice.
AI Governance Becomes a Strategic Imperative
In this context, AI governance is no longer optional.
Organizations should consider:
-
Rigorous data mapping of training datasets
-
Documented legitimate interest assessments
-
Scalable transparency mechanisms
-
Alignment between GDPR compliance and AI Act obligations
In my experience advising boards and AI teams, one recurring expectation has been regulatory relaxation to facilitate AI development. The current trajectory suggests otherwise.
Europe appears determined to preserve a high standard of fundamental rights protection, even in the context of AI innovation.
The Legal Perimeter Has Not Shifted
The definition of personal data remains unchanged. Consequently, the legal feasibility of AI training continues to depend on careful interpretation, documentation, and governance.
The Digital Omnibus does not remove uncertainty around legitimate interest. It does not lower anonymisation thresholds. It does not fundamentally recalibrate the GDPR framework for AI training.
Instead, it confirms that organizations must operate within the existing structure.
The strategic question for businesses is therefore straightforward:
Are they prepared to defend their AI training models under the current GDPR framework, including a robust legitimate interest balancing test?
The answer to that question will define the next phase of AI governance in Europe.
On the same topic, you can read the article “AI Training Based on Legitimate Interest: Is the Digital Omnibus Proposal Enough?” and our AI Law Journal HERE.

