Opinion: AI threatens jobs – Data is really in danger!
LLMs, AI chat and agents make personal data accessible in a simple and clearly formulated way. At the same time, the operators collect large amounts of it.
(Image: Elnur/Shutterstock)
The fear of AI is often of a theoretical nature, especially the concern that AI will eat my job. In the banking sector, analysts predict that 200,000 jobs will be lost. However, this compares to one million unoccupied jobs in the public sector in 2030. We will still be grateful to AI here.
For the vast majority of people, on the other hand, the dangers that already exist and affect personal data – are the data of all people who appear in any form on the internet, on websites, in reports, in databases or in social media profiles. This is because the large LLM models suck all of this in during training and spit it out again in friendly language on prompt request. Like a fat whale, they filter personal data from the flood instead of the plankton and digest it.
One could argue that the big players in the industry, such as Google, Meta and Microsoft, have always been sucking this data into their big bellies anyway. But the AI-driven way of exploiting this data is reaching a new dimension in terms of performance, intelligent compilation and ease of analysis for almost anyone.
No small data fish
The extent of data harvesting is shown by an investigation by the data protection company incogni, according to which LLM operators not only collect data on the internet – often ignoring the robots.txt – but also in diffuse "databases" (Claude), with "marketing partners" (Gemini and Meta) with "data brokers" (Microsoft), with "security partners" (ChatGPT, Gemini and DeepSeek) or on cell phones. Here, location data (Gemini and Meta), telephone numbers (DeepSeek, Gemini and Pi) or even photos (Grok, also shared with third parties) are available for consumption. None of the LLM operators in the study offered an opt-out option for this.
The situation is different when dealing with prompt and chat data: Here, ChatGPT, Copilot, Mistral and Grok allow users to declare that the AI companies should not use their communication data. Anthropic generally refrains from using user input data for its Claude models. All other companies in the study remain silent about this.
Videos by heise
With this monstrous data lake, the LLM operators are violating fundamental principles of data protection, for example that personal data should not be merged so easily ("client separation") in order to prevent the creation of such data pools of sensitive data. However, this is exactly what is currently happening with LLM training.
In theory, this makes it possible to easily answer questions such as "Who has Susi been on vacation with in recent years?" if Susi has occasionally posted this somewhere semi-publicly. AI interfaces such as ChatGPT may fish out such questions, but the answers are still in the LLM belly. And with clever prompting, more can be teased out than what the LLM operator might want – Security experts know this.
The AI provides correlations here that would have been far more time-consuming or even impossible with a traditional Google search. This danger is more concrete than the threat of job loss, which for most people is initially only theoretical. This is precisely why, in the age of AI, social media friends in particular should be more aware of who or what is taking over their semi-public data – it could be a data shark with a whale's belly.
(who)