Cloudflare lets AI crawlers run aground if not paid for scraping

Cloudflare can now protect websites from crawler access by default. However, AI companies can also pay operators for content scraping.

listen Print view
Symbolic circles around a website with a lock symbol

(Image: Cloudflare)

3 min. read

AI companies often access website content via web crawlers without being asked, for example, for internet searches or to train AI models. So far, the operator has gained nothing from this apart from a higher load on the server. Cloudflare now wants to block such AI crawlers by default, and will soon also offer AI companies the option of paying website operators for this content scraping if the content is important enough to them.

The internet and network company has already been offering its customers the option of blocking AI crawlers for some time. However, this scraping block is now activated by default when a new domain is created. Cloudflare had already taken further measures beforehand. An AI labyrinth is designed to ward off unwanted bots by redirecting the web crawlers to a honeypot instead of scraping website content.

A developer presented a similar solution at the beginning of this year. The Nepenthes tool is a tar pit for AI web crawlers, as it lures crawlers into an endless labyrinth or even feeds their endless hunger for data with masses of pointless content. But it's not just about copyright protection because AI crawlers are increasingly becoming a server problem. In January, AI bots paralyzed a Linux news site and others.

Videos by heise

Cloudflare wants to counter this issue by blocking AI crawlers. According to the company statement, website operators should decide for themselves “whether AI crawlers can access their content at all and how this material may be used by AI companies.” This is because AI companies would use the content for their purposes without involving the authors, meaning they would earn less from it. “Original content is what makes the internet one of the greatest inventions of the last century,” says Matthew Prince, co-founder and CEO of Cloudflare. “That's why it's imperative that creators continue to create it.”

One way of financing websites could be “pay per crawl”, as Cloudflare explains in its blog. This initiative allows website operators to pay AI companies for access to their content, instead of blocking AI crawlers completely or allowing full access without compensation. Cloudflare uses the almost forgotten HTTP error code 402: “Payment required”. If an AI bot encounters this, the AI company in question can contact Cloudflare or the operator to enter into a paid agreement instead of simply being rejected via HTTP-403 (Forbidden).

However, website operators can also allow exceptions for individual AI bots if they have already made appropriate agreements or support the purposes of this special scraping. This pay-per-crawl program is currently in a closed beta phase, but interested website operators can still register with Cloudflare.

(fds)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.