Prompts, phone, location, social media: LLMs gather tons of personal data
A study shows how ruthlessly LLMs handle personal data: when collecting it on social media, in the apps and when passing it on to third parties.
(Image: Elnur/Shutterstock.com)
According to a study by data protection company Incogni, French AI provider Mistral handles users' private data most carefully with Le Chat (9.8 points). It is followed by ChatGPT from OpenAI (9.9 points) and Grok from xAI (11.2 points). Meta.ai scored the worst with 15.7 points, with more points meaning a greater violation of privacy.
(Image: incogni)
The publishers of the study criticize the overall lack of privacy protection in AI applications: “The potential for unauthorized data sharing, misuse and exposure of personal data has increased faster than privacy guardians or investigations could keep up.” Ordinary users are unable to assess the practices of AI companies and the associated risks: They would need the training data and information about “ongoing interactions” to determine whether their personal data has been exposed.
Model training, always with personal data
A good part of the Incogni analysis deals with the training data and succinctly states that “all platforms directly or indirectly state that they use user feedback and publicly available private data to train their models.”
Whether the operators also use user input for training is often difficult to determine. According to the report, only Anthropic states that it generally does not use user input data for its Claude models. ChatGPT, Copilot, Mistral and Grok offer an opt-out option. However, this only applies to the prompts; it is not possible to protect your own personal data, for example from social media sources, with any of the AI providers examined. On the contrary: Incogni refers to reports that the model training simply ignores barriers such as those in robots.txt.
However, AI operators do not only use the web as a source of personal data:
- “Security Partners” (ChatGPT, Gemini and DeepSeek)
- Marketing partners (Gemini and Meta)
- Financial institutions (Copilot)
- Unspecified databases (Claude)
- Unspecified data brokers (Microsoft)
Only Inflection AI states for Pi that it only integrates personal data from “publicly accessible sources”.
Data protection declarations, often confusing
What is important for those affected – is therefore almost everyone because there is hardly anyone who does not have personal data somewhere on the web – is how transparent the AI operators are about what they do. The report praises Anthropic, OpenAI, Mistral and xAI for making this transparency information easy to find and read. The study deplores the opposite in the case of Google, Meta and Microsoft, which do not even have a standardized privacy policy for their AI products.
(Image: incogni)
Incogni also found out which data the AI companies pass on from the respective privacy policies. Microsoft, for example, shares user prompts with “third parties that perform online advertising for Microsoft or use Microsoft technology to do so.” Almost all of them share the prompts with “service providers” and many explicitly with “law enforcement agencies” (DeepSeek, Gemini, Grok and Meta).
Videos by heise
App: phone numbers and location data
The use of AI apps opens up even more personal collection opportunities for data-hungry operators:
- Precise location data and addresses (Gemini and Meta)
- Telephone numbers (DeepSeek, Gemini and Pi)
- Photos (Grok, also shared with third parties)
- App interactions (Claude, Grok also shared with third parties)
According to the report, Microsoft claims that it does not collect any data for the Android app, but does for iOS. The analysts chose to evaluate the company based on the iOS app.
(who)