Unprovoked mass surveillance: Why mathematical laws speak against it

Page 2: Consequences of a surveillance infrastructure

Contents

Gerd Gigerenzer has also calculated the damage that a system with a supposedly acceptable error rate can cause, using the example of the controversially discussed EU-wide chat control. A corresponding draft law provides for all communication to be scanned on the client side for CSAM content and grooming, i.e. targeted contact with pedocriminal intent. It is not only EU Commissioner Ylva Johansson who is fighting for this, but also a whole army of lobbyists from the US surveillance industry in the background. Such a law would mean that end-to-end encryption would be effectively destroyed, a surveillance infrastructure would be created, and the system could even endanger children and young people who share intimate photos with each other.

Gigerenzer's exemplary calculation is based on the following assumptions: Around 3 billion messages are sent every day on WhatsApp and in Germany alone. We have corrected this population to 2 billion for our calculations to better illustrate the absolute figures. If CSAM or grooming is hidden behind just 0.0001 percent, i.e. one in a million messages, this would mean a total of 2,000 messages per day. With a rather unrealistically high hit rate of 99.9 percent, the system would detect 1998 of these messages. However, it would also incorrectly classify almost 2 million of the legal photos and videos as abusive. Incidentally, even if we assume a significantly higher proportion of CSAM/grooming (around one in every thousand messages), the blatant disproportion between false positives and true positives remains.

The dimensions and ratios are difficult to imagine and depict, so here's a comparison: If you put each WhatsApp message in a standard envelope around 20 centimeters wide, the two billion messages lined up together would cover a distance of 400,000 kilometers. That is roughly the distance between the Earth and the moon. The chain of false-positive envelopes will be almost 400 kilometers long and will also make it far into space, namely to the orbit of the International Space Station (ISS). The true positives, on the other hand, only reach 399 meters, which is roughly the distance to the nearest bakery. Or if you want to stick with height: the letter chain would reach to the top of a mid-rise skyscraper, namely the Guiyang International Financial Center T1, which ranks 42nd on the list of tallest buildings.

According to an internal EU Commission report published in 2022, the images classified as CSAM would then be checked by humans and sorted out by hand. According to this report, the accuracy of the current grooming detection technology is only 90 percent. This means that only 9 out of 10 reported messages actually contain attempts by paedophiles to gain the trust of children. The EU Commission assumes "over 90% accuracy and 99% precision" when detecting unknown images of abuse, as stated in the impact assessment for the draft regulation.

In reality, however, the respective error rates are likely to be significantly higher, as the figures are based on manufacturer information. The providers have not disclosed to the EU Commission which data they have used to test their systems. It is therefore not possible to estimate how well the technology works under realistic conditions.

Even with a significantly better system with a hit rate of 99.999 percent, there would be ten false positives for every correctly identified inappropriate message. There is therefore no way around significantly reducing the false positive rate. To achieve this, however, the system would have to be adjusted in such a way that it would wave through the majority of abuse images. This would mean that it would no longer fulfill its purpose.

Many mass screenings that are intended to prevent diseases, epidemics or other rare events, such as cancer, are subject to these statistical laws: Detecting cancer at the earliest possible stage, sending corona sufferers into quarantine before they trigger an epidemic, or fishing out wanted criminals via real-time video recognition at Berlin Central Station. The statistical phenomenon described above can occur in all of these scenarios: a lot of effort, few hits, considerable potential damage due to misdiagnosis.

In medicine, the dilemma is solved by not sending everyone from 0 to 99 for screening, but only the population group in which the disease is relatively likely to occur. For example, only people over 50 are invited for bowel and breast cancer screening. The prevalence, i.e. the occurrence of the phenomenon in different groups of the overall population, is therefore considered. As a result, the benefit-harm calculation is much more favorable, so that significantly fewer healthy people are put at risk per life saved, for example through radiation exposure or further (invasive) examinations.

When image or other detectors are used in mass screenings to detect rarely occurring problems, a statistical dilemma arises. Even with very high accuracy (Accuarcy) of the exemplary CSAM and grooming scanner described in the text, a disproportionately high number of false positives occur (left). The system is therefore made less accurate so that it only detects 80 percent of unknown abusive chat messages, for example (middle). However, this would still result in over 200 false positives per CSAM detected. Even with an unacceptably low detection rate of just 40 percent, there would be 75 false positives for every correctly identified message (right).