Meta is allowed to use your public data – OpenAI too, by the way

The Irish Data Protection Authority has confirmed that Meta's actions are legal. The objection only helps to a limited extent.

(Image: mundissima/Shutterstock.com)

May 23, 2025 at 2:08 pm CEST

5 min. read

By

Eva-Maria Weiß

From May 27, 2025, Meta wants to use the public data of its platform users to train its AI models. This means that what every other person in the world can see anyway will then be included in the AI training. You have until May 26 to object to this. However, it is questionable how much this will really help.

First of all, of course, there is no harm in lodging an objection. After all, an AI model does not forget, and it is very easy to do in your settings under data protection. There is an input field there where you can give reasons for your objection, but you don't even have to.

However, if you do not object, your public posts, i.e., photos on Instagram, posts on Facebook and the like, will be used. The Irish Data Protection Authority (DPC) has now confirmed that this procedure is okay. Meta had discussed directly with the DPC what they intended to do and how they wanted to proceed. It is therefore not surprising that the green light has been given. At the request of the DPC, Meta is also said to have made improvements, for example when it comes to the identification of individuals.

Nevertheless, consumer protection organizations and the data protection organization Noyb are trying to take action against it.

OpenAI also uses freely available data

It must be made clear that Meta will not read any private messages or even use them for AI training. WhatsApp chats and Instagram direct messages will remain end-to-end encrypted and private. The conversations that users will have with Meta's AI chatbot Meta AI in future are an exception. It can also be activated in group chats, for example. These parts of the conversation are then considered public and are used by Meta for training. Google and OpenAI do this better: here it is possible to have conversations with the respective chatbots, the content of which does not flow to the company behind it.

Videos by heise

However, the public posts on Instagram and Facebook could also be accessed by OpenAI, Google and Anthropic or a Chinese AI company such as Alibaba and who knows who else and used to train their AI models. So if we prohibit Meta from using it, that doesn't mean that no one will make use of this data. OpenAI has repeatedly emphasized that it uses all freely available content on the internet to train its models. The Sora video generator is likely to have been trained with videos from Instagram, among others. As YouTube videos are also said to have been used, Google complained about this approach. Meta cannot technically prevent this, but can stipulate in the terms of use that user data may not be used by others.

We have asked whether Meta has done this and is taking further measures. The answer will be added as soon as we receive it. We have also asked OpenAI about the data use of public posts from Meta platforms.

US law for all posts

Then there is the problem that contributions written by someone from the EU are also available worldwide. In the USA, it is permitted to use contributions for AI training. You would therefore have to mark posts from the EU in such a way that a crawler knows that it is not allowed to use them. Unless, of course, you argue that you are allowed to do so because you can act in the USA – what the AI companies can do. The next step would therefore be to stipulate that contributions from the EU may otherwise no longer be visible in the USA.

Read also

KI-Update Deep-Dive: Meta und die EU-Regulierung

But it's not all hopeless. The GDPR stipulates, for example, that it must be possible to rectify data about a person. There is also the right of access, which gives EU citizens the right to know what data about them is known and used. Both are hardly possible with current AI models. However, this applies not only to Meta, but to all AI models.

Meta also says that the data of EU citizens must be used because otherwise there would be no AI models that do justice to the local languages and culture. If AI models were trained exclusively on US data, everything would probably be very exuberant, exciting, and great. So you might have to ask yourself whether you want to take advantage of the benefits of AI or protect your data. Of course, this doesn't just apply to Meta, but also to OpenAI, Google and co.