Anthropic can partially fend off copyright lawsuit from book authors

Anthropic copied books without permission and used them for LLM training. Admissibility depends on the method of procurement, says a US court.

listen Print view
Several open books lined up directly next to each other

(Image: Daniel AJ Sokolov)

9 min. read
Contents

Anthropic has achieved a partial victory in the dispute over possible copyright infringements in the training of large language models (LLM) with unlicensed book copies. A US federal district court has partially granted Anthropic's motion for summary judgment. Accordingly, the use of the copies for AI training is permissible, only the downloading of electronic books from “pirate sites” is illegal. Both Anthropic and the book authors concerned can appeal.

Dozens of lawsuits alleging copyright infringement by AI operators are pending in the USA. In this case, three book authors, Andrea Bartz, Charles Graeber and Kirk Wallace Johnson, have sued. Anthropic has created a digital library without licenses that is supposed to contain as many books as possible from around the world. The proceedings before the US Federal District Court for Northern California concern several groups of actions:

  • Anthropic downloaded and stored well over seven million e-books from illegal sources on the Internet.
  • Anthropic purchased (generally used) print editions, scanned them in their entirety with text recognition, and destroyed the print editions.
  • Anthropic made countless additional copies of many digital books (from both sources) to train various LLMs.
  • Anthropic has also made additional copies for other purposes. However, these copies were not shared with third parties outside the company.

The specific complaint does not allege that Anthropic's LLM distributed texts protected by intellectual property rights to the users of the LLM. This is because specially installed filter software has prevented this (at least so far). The lawsuit also does not address the production and use of further copies of works for this filter software.

Anthropic has requested that the court recognize all allegations as fair use and discontinue the proceedings. The aim of US copyright law is to “promote the progress of science and useful arts”. If it helps to achieve this goal, foreign works can be used even if the rights holders do not agree. This doctrine is known as fair use. The law does not conclusively regulate when exactly fair use exists. That would be difficult.

Videos by heise

In the event of a dispute, four factors must be examined: The purpose of the use matters – commercial, non-commercial or for education – as well as the nature of the work, the excerpts used compared to the work as a whole and finally the impact on the potential market or value of the work. The four test results are then weighed against each other.

This is what the Federal District Court did. In doing so, it divided the facts of the case into three parts and ruled as follows:

The nature of the use (1st factor) argued in favor of fair use because the use was “spectacularly” transformative. Anthropic's aim was not to replace the works used, but to create new texts using artificial intelligence.

The nature of the works (2nd factor), on the other hand, argued slightly against fair use, whether they were non-fiction or fiction.

As far as the volume copied (3rd factor) is concerned, Anthropic has indisputably used entire books, and a great many of them. While this was not strictly necessary to LLM's training – Anthropic could have, for example, paid authors to write new texts, or simply used fewer works or works by other authors than the plaintiff – it was reasonably necessary. And the latter is the legal standard.

Anthropic had used particularly good books, which had “compelling advantages”. And so the judge surprises by finding that the 3rd factor (extent of use) weighs in favor of fair use, even though the defendant copied entire books.

As to the impact on the potential market or value of the work (4th factor), the court found that Anthropic's used to train its LLM did not displace demand for copies of the work. While the unlicensed practice may prevent the creation of a market for licensing works for LLM training, this economic goal of the book authors is not covered by copyright law. However, the judge does not stop at interpreting the 4th factor neutrally, but surprises with the finding that the use of the entire books speaks in favor of fair use.

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.