Mark Zuckerberg is said to have allowed AI training with pirated copies
In a US legal dispute over the training of AI models, statements have come to light that incriminate Meta CEO Mark Zuckerberg and his developers.
(Image: MR Gao/Shutterstock.com)
Meta CEO Mark Zuckerberg is said to have personally allowed his developers to use pirated content to train Meta's AI models. The developers also deliberately removed copyright notices from the material. This is what the lawyers of several prominent US authors are accusing Meta of in a legal dispute before a Californian court.
In the recently published documents (PDF), the authors' lawyers refer to statements made by Meta employees and internal Meta correspondence. According to the documents, Meta's AI team is said to have received approval to use LibGen data for training the Llama models after "escalation to MZ". "MZ" stands for Mark Zuckerberg. The authors' lawyers describe LibGen as a collection of pirated copies of copyrighted works. All decision-makers, including Zuckerberg, were aware that the data was pirated.
Authors have already suffered setbacks in court
The Meta developers had previously hesitated to use the data. One employee had written: "Using file-sharing services on a company laptop doesn't feel right." Meta is also said to have admitted to removing copyright notices from LibGen e-books and scientific articles. The authors' lawyers believe that the company also wanted to prevent references to copyrighted material from appearing in the AI's answers to users' questions. According to the document, the Meta developers themselves also had to upload copyrighted material to file-sharing platforms to be able to download anything at all.
Videos by heise
In the legal dispute, the US authors Sarah Silverman, Richard Kadrey and Christopher Golden accuse the Facebook group Meta of illegally using their books to train AI models, among other things (Kadrey et al. v. Meta Platforms, Inc.). In addition, the answers of the AI models themselves violated copyright law. In September 2023, the court rejected most of the authors' allegations, but not the allegation that the training of the AI models with the protected works infringed copyright. Meta did not respond to an inquiry from c't until the publication of this report.
(cwo)