AI and Hallucinations: Why Are So Many Answers Wrong?
False information, inconsistent connections, and even fabricated sources: the still unsolved problem of AI models.
(Image: tadamichi/Shutterstock.com)
It is known that AI hallucinates and makes mistakes. It is also known that AI providers have no solution for the problem. A new study by the European Broadcasting Union (EBU) now shows how many answers are simply wrong when it comes to news. According to the study, every third answer from common chatbots contains errors. The reasons for this are varied, but they all have in common that there is currently no solution for them.
It is striking that it is not just fabricated information, but according to the study, the sources were also often completely invented. This makes the necessary control by users more difficult. After all, the study states that the results have improved compared to a previous study, specifically, from around half of incorrect answers to about 37 percent. Copilot, ChatGPT, Perplexity, and Gemini were examined.
The latter performed surprisingly poorly. Actually, Google's chatbot Gemini is based on both real-time search and the Knowledge Graph knowledge base. Furthermore, Google has the most experience in searching the internet for information. However, Gemini is also not to be equated with the AI overview or the AI mode in search. These could perform better.
45 percent of all answers contained at least one error. Incorrect sources were the most common problem at 31 percent. The authors of the study write that it is primarily an issue for publishers when content is attributed to them that does not correspond to them—for example, false information. The study, led by the BBC, involved 22 public service broadcasters from 18 countries in 14 languages.
AI Models and Their Fallacies
The reasons for false information lie in the AI models themselves. These sometimes link knowledge incorrectly. For example, a chatbot has already made a court reporter made a murderer because the AI brought names and articles into congruence—but in a completely wrong way. Answers are based on probabilities and the learned proximity of information. One and one can sometimes result in three. And since an AI model cannot count, it is still an issue to reproduce the correct number of 'e's in the word 'strawberry'. If it works, it's because the model found the information, not because it suddenly learned to count.
Another problem can be training data that already contains false information. An AI model learns this directly. Even a distorted representation of something can lead AI to draw false conclusions. For training data, everything available is known to be used. Click workers then process and clean this data so that, among other things, punishable content is recognized as such. A burdensome job for the people who have to view this material. Mercor, for example, is such a company for which around 30,000 people worldwide perform this task. Company valuation: 10 billion US dollars. Customers include OpenAI and Anthropic. Meta relies on Scale AI, in which they have also invested financially.
Reinforcement learning as an advantage and Disadvantage
AI models also strive to always find answers that please the user and that are most likely to be correct. This can also lead them to make something up rather than say they don't know something. A behavior that appears very human but is simply related to the structure and specifications. AI models are essentially rewarded for correct answers. On the one hand, this leads to an improvement in the quality of answers—in terms of the learning process. On the other hand, it also has the disadvantage that this type of people-pleasing leads to misinformation. It's like a person choosing something on a multiple-choice test rather than not marking anything at all. Not marking anything would mean they are definitely wrong, while marking increases the chances of being right, depending on the number of options.
Videos by heise
In a second study, the BBC found that more than a third of adults in the UK say they would fully trust AI. Among those under 35, it is even half of those surveyed. The study also revealed that consumers blamed not only chatbots for incorrect information but also the linked sources—even if they had nothing to do with the errors.
The authors of the current study demand that AI providers prioritize the problem resulting from false information and the credibility of AI chatbots. They also demand more control options for publishers on how their content is processed. This includes, for example, a uniform citation method. Finally, the authors raise the question of how AI providers can be held responsible for content. The issue is that answers are not generally reproducible. If a chatbot says the banana is blue, it doesn't mean that this false information will come out in all other questions about bananas. In such a simple case, the AI model could be instructed that the fact banana=blue is incorrect and should not be said again. However, this could also result in the model refusing conversations about bananas thereafter. The control options are limited.
The authors also advocate for users to have a better understanding of how AI works.
(emw)