Reddit locks out search engines and AI bots - if they don't pay

Large search engines and AI data collectors are blocked by Reddit. Only Google is exempt. The reason for this is probably Reddit's AI license deal with Google.

Save to Pocket listen Print view
Reddit logo on smartphone screen

(Image: Ascannio/Shutterstock.com)

3 min. read
By
  • Frank Schräer
This article was originally published in German and has been automatically translated.

If Internet users search for certain topics using Bing or DuckDuckGo, for example, they will not see any current content from Reddit. However, if the Google search engine is used, new Reddit suggestions will also appear. This is because Reddit has started to lock out various search engines and their web crawlers if they do not reach a license agreement with the online platform. Google is likely to be exempt because Google has licensed content from Reddit for AI training.

If the internet search is restricted to a specific website using the well-known search engine trick "site:reddit.com", even the largest Google alternatives such as Microsoft's Bing, DuckDuckGo, Mojeek and Qwant only deliver older results, reports 404 Media. According to the report, Reddit has blocked these search engines for around a week, meaning that the platform's content can no longer be searched and indexed by the relevant web crawlers. Only search engines such as Kagi, which use Google's index, still deliver up-to-date Reddit content.

Reddit had already threatened to exclude search engines in the fall of 2023. This is because Reddit wants money from AI companies to train AI technology with its content. As the social news aggregator is one of the most valuable sources of training data, Reddit conducted negotiations with a number of AI companies. Reddit allowed one of these AI companies access to its platform - for 60 million dollars, namely Google. Now other search engines without a license deal have apparently actually been excluded.

But Reddit disagrees. "This has absolutely nothing to do with our recent partnership with Google," a Reddit spokesperson told The Verge. "We have been in talks with several search engines. We have not been able to reach an agreement with all of them, as some are unable or unwilling to make enforceable commitments regarding their use of Reddit content, including its use for AI."

Like virtually all websites, Reddit uses the robots.txt file to prevent web crawlers from scanning all or specific content. Reddit changed this file last month to prevent data scraping. Just like the web crawlers of search engines, which index content for user searches, artificial intelligence (AI) companies also search websites, but for the purpose of data extraction in order to better train their AI models.

This displeases website operators, as AI chatbots use third-party content for their own purposes. What's more, in some cases the content is also displayed incorrectly by the AI. However, the robots.txt file is not an insurmountable wall either. Only recently it became known that the AI search engine Perplexity ignores robots.txt and displays information without permission and sometimes even incorrectly.

(fds)