Head of Microsoft AI considers content on the internet to be "freeware"

According to Mustafa Suleyman, there is a social contract that allows content to be used online - including for AI training. He has met with a lot of opposition.

Mustafa Suleyman at the Aspen Ideas Festival

(Image: NBC / YouTube)

Jun 30, 2024 at 8:19 pm CEST

3 min. read

By

Nico Ernst

Mustafa Suleyman has been CEO of the independent company Microsoft AI since March 2024. A few days ago, the co-founder of Google DeepMind made his first major appearance at the Aspen Ideas Festival conference in Colorado. During a long interview there, he caused quite a stir in the US media.

Interviewed by CNBC presenter Ross Sorkin, Suleyman was not sparing with his steep theses on the current state of the development of artificial intelligence. In particular, his radical statements on the legality of collecting all publicly available content on the internet caused a stir. "For content that is already on the open web, there has been a social contract since the 1990s that 'fair use' applies. Anyone can copy it, make something new from it, reproduce it - if you like, that was freeware, that was the understanding."

Videos by heise

Suleyman made only one caveat: "There was a separate category where a website, a publisher, or a media company explicitly said, don't search and collect from us except for the purpose of creating an index so that others can find that content. That's a gray area." When asked what was meant by this gray area claimed by Suleyman, the manager replied: "So far, some people have taken that data, I don't know who hasn't, and that's now being sorted out in court." Moderator Sorkin did not ask what form this "social contract" put forward by Suleyman should take. Other issues such as copyright and personal rights were not addressed in the context of the claim that the internet consists of "freeware". Suleyman's statements can be found in an NBC YouTube video from minute 14:30.

Undefined "fair use"

The exact concept behind the term "fair use" used here is not addressed in the interview. In US copyright law, "fair use" is permission to use third-party content for very specific purposes, such as bibliographic or artistic purposes. The US medium The Verge points out that fair use is not granted by a social, i.e. unwritten, contract, but by courts if users wish to invoke it. Anyone wishing to invoke fair use must define precisely when an otherwise punishable copyright infringement should be permitted.

Suleyman's claims are based on the enormous hunger for data of large AI companies, which the industry also admits, as well as several legal disputes. For example, the New York Times filed a lawsuit against OpenAI because ChatGPT uses certain prompts to reproduce articles from the newspaper that are behind the paywall almost identically. Other media companies, such as the Reuters agency, are now licensing their content for AI training. Even freely available content has now been mined to such an extent that OpenAI is said to have converted thousands of hours of YouTube videos into text using the "Whisper" transcription software in order to obtain more training data.