Cologne ruling: Meta may also use sensitive data for AI training
If users share sensitive information such as health data on social networks, operators can use it directly to train AI systems.
(Image: incrediblephoto / Shutterstock.com)
At the end of May, the Cologne Higher Regional Court (OLG) disappointed experts with its announcement that Meta would be allowed to use the data of all adult European users of Facebook and Instagram to train its own AI applications such as the large language model LLaMA. Federal Data Protection Commissioner Louisa Specht-Riemenschneider, for example, criticized the urgent decision as “unbelievable”. The fact that Meta and other social network operators are now allowed to use sensitive data such as health information or information on political, religious or sexual preferences for legitimate purposes such as training AI systems is likely to further anger privacy advocates.
In principle, Article 9 of the General Data Protection Regulation (GDPR) contains a ban on the processing of sensitive personal information. However, this is restricted by a long list of exceptions. Accordingly, the prohibition does not apply, for example, if the data subject has “manifestly made public” sensitive data.
According to the recently published reasons for the ruling, the OLG considers this requirement to be met if a user posts relevant information about themselves in their public user account of a social media service or shares it in a public post (case reference: 15 UKl 2/25). In such a case, the average user must be aware “that this data can be viewed by anyone and can even be found using search engines”.
Third-party data is not a problem either
Even if sensitive third-party information is affected, this is not subject to the ban under Article 19 GDPR, according to the Cologne judges. They assume that the prohibition in the specific case would require “activation” by a request from the third party concerned to remove their data from the published post or from the training data set. However, the 15th Civil Senate is not entirely sure about this. It has indicated that it intends to refer this question to the European Court of Justice (ECJ) should it come to a trial on the merits.
The Higher Regional Court justifies its opinion with the fact that the European legislator has expressly recognized the necessity of training large generative AI models with “huge amounts of text, images, videos and other data” in the AI Regulation. It has long been known that companies use web scraping to obtain AI training data. This always carries the risk of – unintentional and untargeted – processing of sensitive data. With the AI Act, politicians are also pursuing the goal of achieving a “pioneering role” for the EU in generative artificial intelligence.
Videos by heise
De-identification instead of anonymization
According to the ruling, the social media operator sued by the NRW consumer advice center also made a credible case for “taking measures to de-identify the data records”. Full names, email addresses, telephone numbers, national identification numbers, user IDs, credit/debit card numbers, bank accounts, vehicle registration numbers, IP addresses and postal addresses were only compiled in an unstructured and “tokenized” form. This is not associated with anonymization. In particular, the faces of the people recognizable in the photos are not concealed. Nevertheless, the Senate has no doubt that this procedure “will reduce the overall risk”.
Valentino Halim, lawyer at the commercial law firm Oppenhoff, welcomed the “business-friendly decision that 'enables' AI technologies” in an interview with heise online. The court's reasoning is quite surprising in parts. It remains to be seen whether the ECJ would “support the narrow interpretation of the ban on processing sensitive data in any referral proceedings”. Data and consumer protection advocates are urging users to make use of their right to object.
(dmk)