Research: AI language models such as ChatGPT discriminate against East Germans
A study shows that ChatGPT & Co. also adopt structural judgement patterns and reproduce prejudices. This extends to body temperature.
For the study, AI language models were asked to evaluate characteristics such as "diligence" or "xenophobia" in German federal states.
(Image: Julia Bergmeister)
Large AI language models such as ChatGPT and its German counterpart LeoLM are not neutral, but systematically reproduce and reinforce regional prejudices against East Germans. This is the conclusion reached by computer science professor Anna Kruspe and her colleague Mila Stillman from Munich University of Applied Sciences in the study "Saxony-Anhalt is the Worst". Saxony-Anhalt in particular performed poorly in the tests, as the title of the analysis suggests.
The researchers investigated the extent to which large language models (LLMs) take on the clichés and prejudices against the East German federal states that are widespread in society. Such systems for generative AI are trained with huge amounts of data from the internet and the media. The study focused on how the AI evaluates the 16 German states when asked about various positive, negative, and even neutral characteristics. The impetus for the study came from earlier contributions by scientists who had demonstrated discrimination by AI on a global level.
Systematic discrimination
The researchers asked the models used to assess characteristics such as attractiveness, likeability, arrogance, and xenophobia for people in each federal state. The results show a clear and systematic tendency for the AI to always assign "lower" values to residents of East German states than to West Germans. For positive traits such as diligence or attractiveness, East Germans consistently received lower scores than West Germans. Paradoxically, the models also gave lower scores for negative traits such as laziness. This led to some contradictory assessments, such as the finding that East Germans are both less hard-working and less lazy.
Videos by heise
The experts conclude that the AI thus adopts the socially learnt pattern of rating the East worse across the board without maintaining logical consistency. The reaction of the models to the query of objective, neutral characteristics is particularly revealing. To test whether the so-called bias also occurs without any cultural reference, the researchers asked the LLMs about the average body temperature of the inhabitants of each federal state.
(Image:Â Kruspe / Stillman)
Here, too, the eastern German states scored "worse", as they were often assigned a lower body temperature. Stillman explains this phenomenon as follows: "The model has learnt that the figures are simply always lower in certain areas than in others." The AI therefore stubbornly repeats a pattern that it has learnt once and that is stored in its training data. This happens even if the queried feature does not provide a basis for regional differentiation. The distortion is therefore inherent in the model and not generated by the question. The English version of GPT-4 behaved conspicuously in a different way, but at least it considers all German citizens to be equally undercooled.
Real danger of discrimination
The authors urgently warn of the real disadvantages that these prejudices reproduced by the AI can have for Ossis in everyday life. If LLMs are used carelessly in application procedures, credit checks, or other assessment systems, they could lead to the educational background, work experience, or qualifications of people from the East being assessed less favorably for no good reason. The models could, for example, give negative weight to subtle differences in language patterns that are influenced by origin.
To reduce this bias, Kruspe and Stillman tested so-called "debiasing prompts". These are explicit instructions to the AI to make fair and origin-neutral judgements. However, the conclusion is sobering: "In order to filter out prejudices, one solution could be to explicitly state in prompts that the person's origin should have no influence," explains Kruspe. "Unfortunately, this is not reliable." The bias is so deeply rooted in learnt patterns that simple instructions are not enough to eliminate it completely. According to guidelines from the German government and the EU, for example, the use of AI should be fair and non-discriminatory.
(map)