AI training: Major US sites exclude Apple

Media companies and major websites in the USA don't want to give Apple anything. For the iPhone manufacturer, this represents a hurdle for its AI efforts.

Save to Pocket listen Print view
Apple Intelligence logo and icon

(Image: Apple)

3 min. read

Several large internet portals and media companies in the USA are apparently blocking Apple in order to prevent the company from using texts and images from their websites for AI training. According to a media report, however, the main reason for this could be that the providers want to receive license fees from Apple and did not find the Californians' previous offers good enough. As Apple is still at the beginning of its AI efforts, access to qualitative training data is particularly important right now.

The Wired report names Facebook, Instagram, Craigslist, Tumblr, the New York Times, Financial Times, Vox Media, USA Today and Condé Nast, among others, as companies that have blocked Apple's bot on their websites. Apple explicitly emphasized the possibility of self-determination as to whether a website is used for training data during the presentation of Apple Intelligence at the WWDC developer conference in order to dispel fears that the company is using third-party content against their will. However, Apple had apparently already collected data from freely accessible sites before the announcement.

To collect data, Apple uses an extension of its Applebot called "Applebot-Extended", which can be excluded via robots.txt. The original Applebot was used to collect data for Siri and the Spotlight search function. The data collected by the Applebot is used by Apple to train its own large language models (LLM). Apple Intelligence is set to launch in the USA in October. European users will probably have to wait until next year to get their hands on it.

Overall, however, comparatively few of the well-known, high-traffic websites have made use of the option to block Apple, reports Wired. The AI start-up "Originality AI" posed as an Apple bot and was only rejected seven percent of the time out of 1,000 sites tested. The website "Dark Visitors" came up with six percent of sites that denied Apple access. Data journalist Ben Walsh found that a quarter of 1167 English-language US news sites tested denied Apple access. This is still very low compared to OpenAI (53 percent) and Google (43 percent) –, but the number is increasing, which Walsh attributes to the fact that Apple is still new to the AI market.

Experts assume that media companies and large internet providers see an opportunity to generate new revenue through strategic partnerships. OpenAI, for example, has already entered into a number of collaborations with media companies. Apple could also be tempted to enter into such partnerships in order to increase the quality of its training data and thus the quality of its AI.

(mki)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.