Unprovoked mass surveillance: Why mathematical laws speak against it
The AI Act opens a back door for biometric mass surveillance. Experts warn that this is unreasonably risky. They call for common statistical sense.
(Image: Collage c’t)
What is the maximum error rate for facial recognition that biometric mass surveillance can achieve without causing disproportionate damage? 1 percent? 0.1 percent? 0.01 percent? How many errors can a so-called CSAM scanner (Child Sexual Abuse Material), which analyses the entire communication of every EU citizen to determine whether it contains depictions of sexualized violence against children, allow itself? These and similar questions are being asked by MEPs and members of committees in Brussels, Berlin and other capitals who are deciding on the introduction of such blanket surveillance methods.
But these are the wrong questions, warn experts such as Gerd Gigerenzer, Director of the Harding Center for Risk Literacy at the University of Potsdam, and political scientist Vera Wilde from the Hertie School Centre for Digital Governance. Even a seemingly insignificant error rate of 0.001 percent –, i.e. a hit rate of 99.999 percent –, which is perceived to be close to one hundred percent, can cause devastating damage to society. In reality, however, the hit rate is significantly lower, especially when the technology has to prove itself in real life, i.e. at train stations or other public places. The trialogue negotiations on the AI Act have considerably weakened the originally planned de facto bans on real-time surveillance and emotion recognition. This makes mass surveillance possible through the back door. In the following, we show why mathematical and statistical laws speak against this practice.
A deep neural network is now almost always behind image and, in particular, facial recognition. This means that these systems have been trained to identify people or other objects using sample photos. The training is controlled by an optimization function that minimizes the system's error rate. It causes the parameters to change after each training data set in such a way that the prediction error becomes smaller and smaller over time until the system can no longer improve any further. In the end, the previously rather unspecific neural network has developed into a system that extracts the characteristic features from photos in order to distinguish between faces, for example.
- The trilogue negotiations on the AI Act have considerably weakened the originally planned de facto bans on real-time monitoring and emotion recognition.
- Such mass screenings for rare problems can have devastating consequences for society.
- This is due to the unreasonably high false positive rate of such systems. This can hardly be corrected due to statistical and mathematical laws.
But no matter how good facial recognition becomes and how complex the internal processes are: The supposed recognition is always just a prediction, i.e. a probability value. If, for example, a photo from a surveillance camera is fed into the system and compared with the terrorist database, it will extract the characteristic biometric features and compare them with those of the stored terrorist photos. The result is a ranking list: With 90 percent certainty it is person 5, with 60 percent it is person 10 and the match with person 25 is only 0.8 percent. The system would therefore choose person 5, but it could also be wrong. If the match with no known terrorist is high enough (for example, only a maximum of 45%), the neural network will probably suggest the category "no terrorist". This assessment can also be wrong, for example if the person is disguised with glasses, a thick beard and a cap.
Videos by heise
Small error rate, huge numbers
So there is always an error rate, which is made up of two types of error: false positives ("terrorist", even though they are not) and false negatives ("not a terrorist", even though they are). Even when it comes to fishing out unknown CSAM content from the communication, there is considerable room for interpretation and therefore various sources of error. For example, the AI must assess the age of the persons depicted and recognize whether a criminal act has been committed. Such systems naturally also have an overall error rate, which is made up of false positives and false negatives. In practice, however, the operators must always find the best possible balance for the respective intended use. If, for example, a bank sets its fraud detection scanner extremely high in order to detect every possible attempt at fraud, there will be an unacceptable number of false alarms, i.e. too many transfers or withdrawals will be prevented and too many customers will be annoyed. In turn, a CSAM or grooming scanner that is set too high would report an unacceptable number of legal depictions where it is difficult to assess the age and/or the act. Similar to fraud detection, it would therefore have to be balanced in such a way that the false positive rate falls to an acceptable level. But that is a problem.
A statistical dilemma arises both in the search for terrorists and in the search for images of abuse: The proportion of terrorists in the total population is vanishingly small. Likewise, the proportion of images of abuse is extremely small compared to the huge number of photos and videos sent back and forth. If the false-positive error rate of the detector is just as high as the false-negative rate, innocent citizens will be targeted by investigators for every correctly identified terrorist.
Because such suspicion can have considerable consequences and investigators cannot check thousands of reports for accuracy every day, the automatic prediction machines must be balanced in such a way that they deliver as few false positives as possible. However, this is at the expense of the actual goal: as a result, more actual criminals fall through the cracks. This is because the overall error rate remains the same: fewer false positives mean a higher false negative rate.
"We can't protect everyone from everything," concludes political scientist Dr. Vera Wilde in her essay "Rock, Paper, Statistics: Mass screening for rare problems endangers society". In it, she attempts to raise awareness of the mathematical laws that make mass screening for such unevenly distributed groups absurd.
In addition to the problem described above of the rarity of an event, i.e. belonging to an extremely small group –, Wilde mentions two further conditions under which such systems fail: Firstly, if the verification of the results is associated with considerable costs and risks. In this case, the resulting harm would outweigh the benefits for the majority of the people concerned. And secondly, if the results cannot be verified with scientific tests at all. Then the initial uncertainty remains.