WTF

Google's AI overviews are still not to be trusted

Since the spring, Google searches have been spitting out AI overviews in more and more countries. But the hallucinatory oddities are only partially diminishing.

Google's AI Overviews: Generated AI image from search results – presented as if it were real.

(Image: Todd Fong / Midjourney / heise online)

Dec 23, 2024 at 10:50 pm CET

10 min. read

By

Ben Schwan

On the plane to Tokyo, I want to find out what options there are for getting from Narita Airport to the city by train. The airplane WLAN seems to classify me as a US citizen – or at least as someone who Google considers to be a person who can be given access to the AI Overviews introduced in the first regions in the spring. So when I ask for an "airliner train" (because that's how I mistakenly remembered the term "Skyliner") in the Japanese capital, the AI overview created by Google's generative Gemini system, which is supposed to summarize web results as intelligently as possible, appears right at the top.

What it says (see screenshot) sounds pretty good: there are "different ways" to get to Tokyo by "airliner train", including the "Skyliner", the "Narita Express" and the "Jodan Skyflyer Ultra Express". Below this, there are a few statements including links to what the different trains mean. While I am familiar with the "Skyliner" and "Narita Express" as competing means of rail transportation, the "Jodan Skyflyer Ultra Express" makes me wonder. Has the airport launched a new special train? The name sounds very Japanese. So I do what I always do – and google it. As it turns out, the term apparently only appears on one website: the blog of a Japan fan called Todd Fong, who gives travel tips and presents his photos.

The dream of flying

In the blog post, he describes a trip on a "flying train" of the same name in a short story-like text. "For anyone who has been living under a rock for the last few months: The Skyflyer Ultra Express is the world's first public transportation system that flies from station to station. It's basically a flying train that travels to Tokyo, Takanawa Gateway, Kawasaki and Yokohama stations." The story is rounded off with some relatively pretty mid-journey AI images, which Fong uses to accompany his post. It is only in the last paragraph that you can read what the story is really about: "Can we be honest here? The story you have just read is part of my "Illusions of Japan" series. These are works that are part true and part fiction – about people, places and events in Japan." In other words: Google's AI overview has fallen into the fake trap, or rather does not seem to "read" texts to the end.

Incorrect AI overview on Google — Faulty AI overview on Google: The "Jodan Skyflyer Ultra Express" is an invention from a web short story.

(Image: Screenshot / heise online)

The confusing thing about this case is the fact that the AI overviews not only missed completely obvious signs of fiction (more than a self-outing is not possible), but also seem to give a high trust score to almost unknown websites. The fact that the search term obviously only appears once in the index (at least in the context of my Google cookie personalization) also casts doubt on the algorithm. How can it be that two facts proven by numerous web sources (Skyliner, Narita Express) are supplemented here with a third, completely false fact ("Skyflyer") that only has a single source?

Keeping people on Google for as long as possible

It was probably smart of Google to initially reduce the display of AI overviews after their introduction in the summer. But you shouldn't make the mistake of thinking that the company will abandon the function. It is too lucrative for that. In the search business, it has a promising future because it allows users to stay on Google itself for longer instead of accessing the web via a direct link from the search engine, as was previously the case. Even before the Gemini era, Google had been integrating more and more content directly into its own offerings under and above its own advertising over the last decade. Weather or exchange rates? Google does it itself. News? Are summarized. Cooking recipes or song lyrics? Sure, we'll give them to you straight away. This regularly caused an outcry in the publishing industry. With the AI overviews, Google has gone one better, because the motivation to go further into the web from the search engine is reduced even further by apparent Gemini intelligence. And sources are only identified by tiny icons, which should further minimize the willingness to click. According to studies, AI overviews together with Google's own content already account for up to 75 percent of page content on mobile devices and around 67 percent on desktops. It is increasingly unlikely that many people will continue to scroll to the so-called organic search results.

Generative AI is known to work with probabilities. Which token – i.e. word part – will come up next? This results in a complex network of language understanding that not even many AI researchers understand because it is far too complicated internally. It is clear that you can never trust the output 100 percent, because you can train it as much as you like, hallucinations will still occur. This is due to the basic system of generative AI. The bad thing is that the output always sounds real, so you can't know which 20 percent is wrong and which 80 percent is right. For its AI overviews, which are a critical application, Google therefore uses Retrieval Augmented Generation (RAG), in which the generative AI can make use of current search data. And yes, the content generated in this way is then even given a source with a link to the web.

No longer funny for a long time

However, these links are often not clicked or refer to untrustworthy content, as in my example above. There was much rejoicing in the AI scene about Google's clever move to sign an exclusive contract with Reddit for training and RAG (which later turned out not to be exclusive at all, as OpenAI also receives the data). After all, it was said, thanks to good moderation, there is also good content. Here, too, it turned out that this was not necessarily true and the AI Overviews were unable to distinguish Reddit satire from real information, among other things. Eating stones, sticking on pizza cheese: that was just the tip of the iceberg. Google boss Sundar Pichai is already promising even more AI in search next year. The "advanced reasoning capabilities of Gemini 2.0" should then be able to answer "more complex and multi-step questions". It's only a matter of time before the feature comes to the EU. According to Pichai, AI Overviews are already reaching a billion people, who will then be able to make "completely new types of inquiries". The AI Overviews have "quickly become one of the most popular search functions of all time". And, of course, the whole thing should also become "agentic".

Read also

False information from Apple Intelligence: BBC complains to Apple

Google's AI search: AI overviews in more countries and more prominent links

Die KI von Elon Musk | Grok getestet

Let's not get the wrong idea: The problem with hallucinating affects all generative AI across the board; this is and remains its birth defect. Even two years after the "big bang" of the large, Transformer-driven language models, the appearance of ChatGPT. The same applies to the 80/20 problem that only experts can say beyond doubt what is actually wrong – or that a great deal of research would have to be done to rule out errors. There is already a certain counter-current here – both on the part of users, who are already looking for ways to prevent AI overviews from being displayed, and among professional content creators. They have recently witnessed that Apple's AI overviews for notifications can also tend to produce extremely strange outputs – that are no longer just funny. The typical picture here too: lots of generative content that is correct, plus the one tiny but serious error that is then overlooked. A "good enough" product is different, because who is supposed to rely on something like that?

Videos by heise

Too much unleashed on humanity too quickly

I don't have a real solution to the problem. Yes, maybe one: How about not unleashing high-risk functions on larger parts of humanity right away? What about the risk assessment in the EU's AI Act? But even then: Apple was massively criticized by the stock market and observers because the company took so long to introduce Apple Intelligence and has still not rolled out all the announced functions. And what happened? Even this "taking time" was not enough. Of course, everything is neatly labeled as a beta or even as an "experiment" that users should not rely on.

But why is it being released at all? There is a great danger that false information from sources that inspire confidence (Apple, Google's search engine) will be taken at face value – and that real accidents will occur when dealing with this content. Maybe that's why people are starting to use YouTube and TikTok as search engines. But they too have long been flooded with AI garbage. May the spirit of Alan Turing be with us in the coming year.