Personal data, including big data, is a valuable asset for businesses, but how to get the best out of them at the age of the GDPR?
As part of the series of blog posts on the major changes introduced by the EU Data Protection Regulation, here is an article on how to limit the impact of its restrictions on the usage of data for economic purposes.
The decision of the ECJ on personal data
The ECJ held that a dynamic IP address does not only constitute personal data with respect to the internet service provider (which has the means to link the IP address to the individual behind the address in any case) but also with respect to the operator of a website,
if this website operator has legal means to identify the visitor with the help of additional information from the visitor’s internet service provider.
And this was the case with respect to German law which apparently provides that the website operator can obtain the information required to identify the visitor of the website from the internet provider via a competent authority which requests the information to prepare criminal proceedings, e.g. in the event of cyberattacks.
On the contrary, the definition of personal data does not include data for which the identification of the individual to which it refers is
- prohibited by law or
- practically impossible on account of the fact that it requires a disproportionate effort in terms of time, cost and man-power so that the risk of identification appears, in reality, to be insignificant.
How to get the best out of data after the GDPR?
The General Data Protection Regulation (GDPR) provides for a definition of “personal data” similar to the prescribed by the current EU Data Protection Directive 95/46, but it introduces the concept of “pseudonymisation” which means
“the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person”.
The typical example of pseudonymised data relates to patients’ data processed during the course of a clinical trial where a code is attributed to each of them and technical and organizational measures are put in place to avoid the association of collected data to an individual.
In order to clarify the scope of the definition, recital 26 of the EU Privacy Regulation provides that
To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.
And to ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.
As previously discussed, anonymisation requires a quite burdensome process which is likely to deprive data of any content which might have an economic value. Therefore, this would avoid the applicability of data protection laws, but in my view, it is not a viable option for businesses.
Big data and aggregate data
This is valid provided that it is not possible to link the data to the individual by any means and the data has not been collected in breach of privacy laws.
In this respect, it is important to outline that the GDPR prescribes for exceptions to the restrictions applicable to the storage and usage of data when processed for “statistical purposes“, as occurs in the case of Big Data. This exception still requires to adopt technical and organizational measure to comply with the principle of “data minimisation” (i.e. only data strictly necessary to achieve the purpose shall be used). But it is an extremely valuable tool when it comes to the processing of data which do not need to be linked to an individual.
Segregated or encrypted data?
The definition of pseudonymised data seems to keep within the scope of the definition of personal data also data that is segregated from other data in a manner that prevents the identification of individuals. However, what if
a privacy by design approach is implemented so that the cost and the amount of time required for identification becomes disproportionate?
The above is valid also in relation to the data whose level of encryption is as such that it is highly unlikely that the processed data can be linked to an individual.
The implementation of the solution above does not just require to keep the data separate, but it requires a privacy impact assessment (and the consequential approval by the data protection authority) of the technology that the company is willing to use and the implementation of a privacy by design approach throughout the whole usage of data. Indeed, the technological evolution obliges companies to continuously monitor the level of encryption and segregation of data in order to avoid for instance that the development of new technologies makes processed data identifiable.
This is a very hot topic which is particularly relevant for telecom operators as well as any Internet of Things, FinTech and other company which relies on its business on the processing of large amounts of personal data. What is your view on the above?