YouTube videos on AI training: Apple Intelligence without data from "The Pile"

Apple is said to have been one of the companies that used the controversial training database The Pile. The company denies this for its new AI.

Save to Pocket listen Print view
Apple Intelligence logo and icon

Apple Intelligence logo and icon.

(Image: Apple)

2 min. read
This article was originally published in German and has been automatically translated.

Apple Intelligence was not trained with the free database The Pile, which contains subtitles of thousands of YouTube videos without asking their creators. The company announced this to the Apple blog 9to5Mac. The company had written in a scientific paper on its OpenELM series of high-efficiency models that the data set was being used. However, OpenELM is not part of the AI systems used by the company, including Apple Intelligence or other technology from the field of machine learning.

According to 9to5Mac, Apple said it developed OpenELM as a contribution to AI research and the advancement of open-source language models. At the time, the company described the technology as a "cutting-edge open language model". However, OpenELM was only developed for research purposes, not to operate any Apple intelligence functions. OpenELM is still available on Apple's AI research website.

Criticism of the training data set "The Pile", which originates from the non-proift organization EleutherAI, was raised in a report by The Proof, according to which other large companies such as Nvidia, Anthropic and Salesforce also use the information. "The Pile" is said to have been fed with subtitles from 170,000 YouTube videos, among other things. There is said to have been no authorization for this.

It is still not clear exactly which and how much training data Apple uses for Apple Intelligence. The company only states that it uses "licensed content, including data that improves specific functions". However, there is also data that Apple itself seems to have obtained from the public internet with its web crawler.

To opt out, website operators must instruct the special "Applebot-Extended" to ignore their own content. The crawling of websites by the AppleBot (which is not used for AI purposes, but for other services) remains in place even when opting out if it is not simultaneously rejected in the "robots.txt" file, the company writes on Apple.com. It is also known that the company does not include personal user data or "user interactions" in the training. There are also filters for credit card data or "obscenity" plus low-quality content - although it is unclear how these are excluded.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(bsc)