AI training for police & Co: "Everything regulated very globally and generally"

The LDI NRW, Bettina Gayk, criticizes the use of Palantir's software by the police. In an interview with heise online, she speaks of excessive surveillance.

listen Print view
Shot of a silver surveillance camera mounted on a metal bracket under a large, complex glass and steel lattice roof. The photo is taken from a bottom-up perspective. The entire image content is surrounded by a thick white rectangular frame. In the top left corner, within the frame, are two overlapping, stylized white speech bubble icons. The background shows the lattice pattern of the roof and the cloudy sky through the glass.

Surveillance camera at a train station (symbolic image).

(Image: Marie-Claire Koch / heise medien)

16 min. read

Stricter police laws, the planned EU chat control, and other measures – the list of initiatives through which the state wants to gain access to ever more data is growing. Controversial systems from Palantir are also being used. At the same time, transparency towards citizens is being restricted.

Bettina Gayk is the State Commissioner for Data Protection and Freedom of Information in North Rhine-Westphalia.

(Image: LDI NRW / Caroline Seidel)

Bettina Gayk, the State Commissioner for Data Protection and Freedom of Information in North Rhine-Westphalia, explains the risks to our freedom in an interview and why she finds the new AI rules in the Police Act concerning.

Lately, there's been a flood of new legislative proposals – from state police laws to amendments to the Code of Criminal Procedure. All aim to make more data, including from the internet, usable for security authorities. Why do you view this critically?

That's difficult to answer so broadly because it always depends on the purpose for which I want to use which data? And is it a suitable use for that purpose, is it proportionate? Where does the data come from, and how reputable is the data? What we are currently observing is that many additional security powers are being created, but it is not being thoroughly and carefully elaborated for which situation, for which prosecution of offenses, for which danger prevention measures these data are necessary and whether the benefit is in proportion to the fundamental rights of those affected.

The more and more openly we use and search data, everyone can be impacted, can also fall into wrong suspicions due to false data or into completely different difficult situations. It is true that offenses also occur in the digital world, so I must be able to investigate there too, but not without limits. The constitution is also relevant for our investigative authorities here, and it mandates proportionality of means in relation to the fundamental rights of citizens.

A concrete example is the use of AI and analysis software like Palantir's by the police. In NRW, the Police Act was amended accordingly, and the police are now allowed to train AI with their data. What risks do you see there?

Everything is regulated very globally and generally. The law states quite generally that the police may use data for AI training. They should do so anonymized, but if it incurs disproportionate effort, they can also work with clear data. I cannot assess at all what effects training with clear data will have on those affected. If I only care about the text of a letter, it might be appropriate to work with non-anonymized data if the AI only produces a nice text without reference to individuals as a result. But if I start training the system with data to supposedly generate better danger prevention, that reaches its limits.

Videos by heise

So, if I consider the entire police data system, which we have already made searchable through the DAR system, or Palantir, and then add an AI to it, it can become boundless. If the AI draws possible but not necessarily correct conclusions based on personal data from the training, it is no longer at all comprehensible who is accused of posing a threat, based on which data, legitimately or illegitimately. One must keep in mind that police databases are anything but solid. Deletion deadlines are not always precisely adhered to. The factual accuracy of data is also often not guaranteed because witness statements may be incorrect, because there were misunderstandings, and thus data was recorded incorrectly, or because a suspicion was not confirmed in the course of proceedings but is still present in police records, and so on.

In addition, police records contain not only data on criminals or so-called 'people of interest', but also on witnesses, lawyers, or individuals who file a complaint or use the police emergency number. The sheer volume of data that the police have can very likely lead them to invoke the disproportionality of data anonymization and allow all of this to flow into AI training unfiltered. What conclusions AI then draws from outdated, sometimes incorrect data is almost a matter of luck. It becomes particularly critical if data from individuals who were never suspected of any crime or danger also flows into the training. AI only generates possible and sometimes fanciful results. There is no guarantee of objectively correct results. The significance of AI would also be called into question by the non-solid data basis.

Such a norm, very generally, that the police may train their systems with AI is useless because it does not sufficiently protect fundamental rights and, in particular, does not counteract the possible adverse effects on wrongly suspected individuals. Much more detailed scrutiny is needed, and it must be legally stipulated which data may be used for training which type of AI under which specific conditions, and what measures must be taken to protect those affected. If one starts training AI systems with unfiltered data sets to gain insights for danger prevention, it can be terrible for people who are pursued due to false conclusions and incorrect data. I'm not saying that AI in the police is never an option, but it must be reserved for serious offenses and cannot become a standard tool in the police. And it is increasingly becoming that.

As a supervisory authority, do you have the ability to examine this system in live operation? So far, there has been criticism that hardly anyone from the outside has seen it.

We were not offered that, but we don't need to be offered it. We can simply do it. And this year, we have at least planned discussions with the LKA because we also want to know what is specifically planned now that the expansion to AI has also been approved for a system like DAR/Palantir. So we will look at that very closely. It will probably also be useful for us to have the work on individual case inquiries to the system demonstrated to us in live operation as an example. We have only seen it in presentations so far, before the procedure went into live operation. But now it is in operation, and we will certainly take random samples to see what is possible with the system.

The EU Commission is once again planning chat control to find depictions of child abuse. The argument is always the protection of children. Why doesn't that convince you?

The blatant example is always sexual violence against children. That tugs at the heartstrings. These are, of course, horrific crimes, just like murder or attacks. But there is still no reason to suspect the whole world of child abuse or similarly serious offenses and to monitor everyone. That simply goes too far.

Furthermore, what has been found through voluntary chat controls over the years has not been so strikingly convincing that one can say: we are catching them all now. Firstly, these people who do such things are often so technically savvy that they are less likely to exchange information via such chats. And one must also consider how effective such a measure is in relation to monitoring all people who are online in chats. For us, it was never a question that there is a secrecy of correspondence, a secrecy of telecommunications, and that one may only monitor with a court order if there are indications of suspicion. Today, almost the whole world is active in chats, and suddenly one should be able to look into all this private communication without cause. Something has absolutely shifted.

Besides security authorities, companies in particular collect vast amounts of data. Many users give their consent, for example, for payment services like PayPal or in apps, without knowing the details. How problematic is that?

We have a current case of an app that we are examining, where there were around 300 providers all over Europe. And how reputable they are; what happens to the data there, the person transmitting it can hardly check anymore. Individual users certainly not. So if I get a list of companies that is almost endless, one asks oneself: What are they all doing with my data?

Facebook, for example, has global or general descriptions in its privacy policy about the purposes or user groups. Anything and nothing can hide behind that. It's not that I, as an affected person, can truly understand where my data is going, what these 300 to 600 companies are doing with this data? The core feature of data protection, which is supposed to ensure the controllability of data, is purpose limitation. Data may regularly only be used for the purpose for which it was collected. This is hardly controllable with processing based on consent to incomprehensible data processing.

The EU is currently planning various data spaces, for example, the European Health Data Space (EHDS) for health data but also for mobility, finance, and more than ten other areas of life. Does this development worry you?

Yes, absolutely. Even the draft laws sometimes go beyond limits. For example, the health data space allows broad use of very sensitive data. The use by so-called health service providers, for instance, went too far here, just as the protection of data on mental health was too weak. Or the mobility data space, which can certainly lead to many convenient applications in terms of traffic control, also harbors the danger that movement profiles of traffic participants become possible. Here, it is important to keep an eye on the protection of data from which conclusions about the movements of individuals can be drawn.

And yes, this is a trend against whose proliferation we are fighting. Because there is so much data processing, it is hard to know where to fight and what dangers can arise from the available data for privacy. There are good reasons for everything to use it meaningfully, but there are also possibilities with almost everything to use it to the detriment of those affected. And the danger that this will happen is quite high. That worries me a lot. Often, the people we want to protect are themselves too unconcerned, and the concerns of data protectionists are often not in the majority in the public. I fear that we will only experience a rethink in the public when something happens, a data scandal that impacts many people significantly.

Another EU project is the digital EUDI Wallet. It is intended to allow citizens to identify themselves securely. However, private providers like Google also want to get involved. Where do you see the dangers?

We are involved because we are a complaint authority to be integrated into the wallet. If someone fears data misuse, they should be able to be directed to us directly from the wallet. But the crucial point is the question of how it will ultimately be regulated, what data can I, as a company, access? There is still great uncertainty. Certificates are actually planned, in which companies specify which data they want to access for which purposes and which are then approved. But in the latest EU regulation, approval is now only optional, i.e., no longer mandatory, but left to voluntary action. And then it is very difficult to control. This is an unfortunate situation for us. Even if Germany provides for this certificate, which I hope, but other member states do not, and the provider is based in Italy or Ireland, it is hardly possible to control whether data was accessed legitimately or not.

Parallel to this data collection, there is an opposing trend: the restriction of freedom of information, as recently in Berlin. How does that fit together?

Freedom of information does not have as long a tradition in Germany as data protection. Until the 2000s, there was the doctrine of the “arcane sphere,” according to which the state could keep certain things secret to maintain credibility externally. This was broken by the argument: You are elected by us, you have our democratic legitimacy, then you must naturally tell us how you arrived at your decision; what were the foundations? This is an obligation of the rulers to make transparent what their decision-making bases were. This is based on an old and good tradition in Scandinavia and Anglo-Saxon countries.

Such tendencies towards restriction and a return to unnecessary and then undemocratic official secrets in the sense of the outdated arcane sphere are increasingly observed. This is, of course, also related to the fact that politics sees itself under increasing pressure, and fears being exposed. I do not think that this can be the right way. Especially in Berlin, restrictions have been introduced under the pretext of ensuring security. This was not necessary, because a threat to security is a reason to refuse access to information in all freedom of information laws. The specific changes to the Berlin Freedom of Information Act are so diverse that a new secrecy is creeping in, where democratically elected governments should indeed be accountable to their voters for their work.

When you take all these developments together – the data-gathering frenzy of states and corporations, the lack of transparency on the other side – what is your personal conclusion?

It doesn't necessarily scare me, but it definitely worries me because I don't really see a lever to limit it. Because more and more data is available, it quickly becomes an automatism to use the data without limits. We have reasonably stable legislation in Europe, but the borders on the internet do not stop at national borders. We have American legislation that reserves the right to access data available here. We have a case where advertising data, including location data, went into trade. They then ended up with a data broker in the USA. And then the data is outside our control. This can be sold to intelligence agencies and who knows who else. This is quite frightening because it is no longer controllable in detail.

At the same time, our workload is increasing enormously. The volume of complaints is such that we can hardly manage many things that we would need to approach systematically. We had a 68 percent increase in complaints from 2024 to 2025. And that is really exhausting and hinders systematic action. In this respect, it is also completely absurd that some politicians are now considering abolishing state data protection officers and reducing data protection. The opposite approach is the right one.

(mack)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.