Perplexity violates AWS guidelines – Amazon investigates

Perplexity uses the infrastructure of Amazon Web Services. However, AWS apparently prohibits the AI response engine from proceeding.

Save to Pocket listen Print view
Typing hands, with symbols and the letters AI floating above them.

Typing hands, with symbols and the letters AI floating above them.

(Image: Shutterstock/Poca Wander Stock)

4 min. read
This article was originally published in German and has been automatically translated.

Perplexity.ai currently seems to have become the designated villain among AI providers. The answer engine, as they themselves call the AI search engine with advanced functions, is now facing an investigation by Amazon Web Services (AWS). This concerns the robots.txt file, which is supposed to exclude crawlers. Perplexity is probably not adhering to the standard. However, AWS states in its terms of use that it must be adhered to.

There have been accusations for some time that Perplexity is sending out its crawlers, even though they have actually been denied access. The tech magazine Wired has observed the behavior and was even able to identify individual IP addresses. Wired is part of the Condé Nast publishing house, and other titles were affected. Forbes also sharply criticized Perplexity. Not only for the fact that the content on the pages is scanned, but also that the content appears in the response engine itself - without referring to the sources. Perplexity has developed so-called pages that are similar to Wikipedia pages and deal with one topic. For example, there is a page based exclusively on investigative research by Forbes on former Google CEO Eric Schmidt. Instead of just mentioning Forbes, Perplexity provides very small-scale links to numerous media that refer to Forbes. According to Forbes itself, there have hardly been any page views as a result.

Although Perplexity's Pages function is particularly prominent for the problem of ignoring robots.txt, the issue also affects other AI searches and AI chatbots. They all use content and reproduce it without the content creators benefiting from it. The issue of training data has also long been known. OpenAI, for example, has now entered into deals with individual publishers to use their content for training its own AI models and to prominently display their publications. However, several lawsuits have also been filed against OpenAI by publishers and artists who consider the use of their works by OpenAI to be a copyright infringement.

OpenAI uses the infrastructure provided by Microsoft for its AI models. Google can also rely on its own infrastructure. Perplexity, however, uses AWS. Wired writes that they have asked whether Perplexity uses the AWS infrastructure to scrape prohibited websites. While robots.txt is a standard that is generally adhered to but is not binding, the terms of use are different and must be complied with. According to Wired's own statements, this demand has led to an investigation by AWS. There is no result yet.

However, Perplexity has said that its own PerplexityBot, which runs on AWS, respects the web standard and adheres to it - in other words, it does not violate the AWS terms of use. However, the bot behaves like a person as soon as someone enters a certain URL in the search. Then it would no longer adhere to the robots.txt. Perplexity's CEO Aravind Srinivas recently explained in an interview with heise online that a new analytics system is needed that counts the use of content and pays for it, not the clicks.

In general, many AI providers, researchers and investors seem to have a very different understanding of how the internet works. Srinivas, for example, accuses Wired of not understanding the internet. The former co-founder of DeepMind, which Google later acquired, also believes that everything on the internet is "freeware": Mustafa Suleyman has been CEO of Microsoft's independent AI company Microsoft AI for several months. "Fair use" is another US copyright peculiarity that AI bosses like to invoke. It states that if it benefits everyone, it is permitted to use content. It is doubtful whether an AI model that brings economic benefits to a company benefits everyone equally.

(emw)