Analysis: AI crawlers can overload servers
Meta generates more traffic from web crawlers than Google and OpenAI combined. ChatGPT, on the other hand, dominates real-time traffic on the web.
Emergency in the data center
(Image: vchal/Shutterstock.com)
Chatbots and the AI models behind them are not only changing the way we search for information, but also the internet in general. This poses a number of challenges for web service providers and content providers. A report by the cloud platform Fastly shows how automated data traffic is developing due to AI bots. Fastly itself is endeavoring to provide solutions for the regulation of crawlers for its own customers.
AI bots are divided into two main groups: AI crawlers systematically search the internet to collect data for training AI models. Fetcher bots, such as ChatGPT agents, retrieve content from the web in real time to answer user queries.
The crawlers' intensive content scraping can lead to server overload. According to Fastly's analysis, 80 percent of all AI bot traffic observed between April and July 2025 was attributable to such crawlers. Meta, the parent company of Facebook, Instagram and WhatsApp, is the biggest culprit, accounting for 52 percent of all AI crawler requests. This volume is significantly higher than that generated by Google (23 percent) and OpenAI (20 percent) combined.
Fetcher bots such as ChatGPT agents retrieve content in real time to answer users' queries. This leads to a large number of requests. In one case, such a fetcher bot made 39,000 requests per minute to a single website during peak load, according to the study. Even without malicious intent, this high load can lead to problems such as massive bandwidth consumption, which, similar to a DDoS attack, can bring the origin server to its knees.
"A worrying trend"
"A worrying trend is the surge in traffic from large-scale AI bots," write the authors, who analyzed more than 6.5 trillion monthly requests. In one case, a single AI crawler reached a peak value of around 1000 requests per minute. This could represent a considerable burden for websites that rely on database queries or provide interfaces for searching Git repositories such as Gitea. For such systems, even short spikes in activity without effective bot controls or scaling measures could lead to slowdowns, timeouts or disruptions.
Another challenge identified by the authors is a lack of transparency. There is no standardized verification for bots. This makes it difficult for security experts to distinguish legitimate bots from potentially malicious traffic. To facilitate bot verification, operators should publish their IP address ranges or support verification methods such as reverse DNS lookups.
Videos by heise
Geographical bias
Almost 90 percent of the AI crawler traffic analyzed originates from North America. If AI models are predominantly trained with content from the USA, the output also corresponds to this. This leads to distortions.
While the large AI companies in the USA invoke the so-called principle of fair use, according to which they are also allowed to use copyrighted works as long as it benefits everyone, copyright law currently forms the legal basis in Germany with the paragraph on text and data mining. This permits the use of copyrighted works for research purposes.
If you want to prevent crawlers from using your own content, you can do so using the robots.txt file. However, this is only a request and cannot technically prevent crawlers. Efforts are already underway to develop new standards. The distinction between bots is also questionable. Google, for example, only gives website operators the option of excluding crawlers for AI training and real-time searches or allowing both.
Without clear standards for checking bots, it will be almost impossible for companies to control data traffic and protect their infrastructure, warns Arun Kumar, a security researcher at Fastly. Automated traffic must be managed with the same precision and urgency as any other infrastructure or security threat.
(mki)