On thin copyright ice: Apple trains AI models with web content

The company uses freely accessible content for "Apple Intelligence". Apple is only now revealing the option to opt out.

Save to Pocket listen Print view
Apple-Logo

(Image: Sebastian Trepesch)

3 min. read
This article was originally published in German and has been automatically translated.

In addition to licensed content, Apple also uses publicly accessible web content to train its new AI models, as the company has now admitted. The company also accesses content captured by its web crawler "AppleBot" to create its own foundation models - this is apparently independent of the license rules in place on the respective website.

According to Apple, those who do not wish to make their texts and images available for "Apple Intelligence" training have the option of opting out. It remains unclear how much and which data has already been used to train Apple AI. To opt out, website operators and content providers must instruct the special "Applebot-Extended" to ignore their content. The "crawling" of websites by the AppleBot also remains in place when opting out if it is not also rejected in the robots.txt file, the company notes.

Apple's approach is similar to that of other major AI providers, who have also used freely accessible web content to train their models and have thus set themselves on a collision course with publishers and content creators. According to previous reports, Apple approached several major US publishers last year about licensing content and is already paying for image content for AI training. There was therefore speculation in the industry that Apple might limit itself entirely to licensed content.

In a recent interview, Apple CEO Tim Cook advised journalists to license their content for AI training. Cook said that this is "really smart for some people" and that it is not clear what could be bad about licensing unless you don't get a good deal.

Other AI companies, such as Apple partner OpenAI, insist that AI training with freely accessible content is fundamentally "fair" and practically "impossible" without access to copyright-protected content. At the same time, more and more deals are being concluded with publishers and website operators.

Among creatives and content producers, often regular customers of Apple, there is increasing resistance to the unsolicited use of their works for AI training. Apple recently felt the extent of the anger: after a storm of indignation, the company apologized for an iPad commercial in which a giant scrap press crushes musical instruments, among other things.

(lbe)