Wikipedia: Bot traffic increasingly disguises itself as human
The free online encyclopedia is recording an increasing number of automated accesses and is appealing to operators of LLMs, search engines, and social media.
(Image: Allmy/Shutterstock.com)
The extent of automated access to the online encyclopedia Wikipedia is likely much greater than previously thought. New analysis methods have revealed that some accesses, which the Wikimedia Foundation initially attributed to human visitors were actually made by bots. However, these are specifically designed to circumvent Wikipedia's detection systems with their behavior.
As Marshall Miller from the Wikimedia Foundation writes in a blog post, the portal recorded significantly higher access numbers in May and June. However, after an update to the bot detection systems for website visitors, he and his colleagues now attribute a large portion of this additional traffic to automated accesses.
Frequent scraping for LLMs
It was striking that a large part of the additional accesses came from Brazil. Access numbers from March to August were then re-evaluated. The result was that in May and June, a massive number of bots accessed Wikipedia, designed in their behavior to appear as human visitors and to circumvent corresponding detection systems. These bots are often used to scrape Wikipedia articles, i.e., to retrieve and then save the content. The data is then often used as training material for LLMs; crawlers from search engines like Google are also typically behind such automated accesses. Last year, the Wikimedia Foundation already blamed AI scrapers for a drastic increase in bandwidth for multimedia content downloads for a drastic increase in bandwidth for downloads of multimedia content.
According to the new figures, the number of human visitors has also significantly decreased—around eight percent fewer in recent months—compared to the same months in 2024.
No surprise—but a problem
Wikipedia officials are not surprised that the number of human visitors is decreasing even further. They attribute the development to the general trend of information retrieval via LLMs, search engines, and social media.
Nevertheless, this trend is becoming an increasing burden for Wikipedia, and the aforementioned bots also contribute to this. Wikipedia relies on donations and volunteers who write, update, and correct articles. Both are declining as people increasingly turn to other sources of information. However, the Wikimedia Foundation believes that LLMs, in particular, still frequently find Wikipedia information. This is because almost all leading LLMs are trained with scraped content from Wikipedia, which was created with the help of Wikipedia donations and volunteer Wikipedia authors.
Videos by heise
The Wikimedia Foundation sees LLMs, search engines, and social platforms as welcome additional information channels. However, their appeal to the operators is: Encourage your audience to visit Wikipedia more often. Because only then will the basis for information, which is often also used on these channels, be secured.
(nen)