AI definition: accusations against the open source initiative

Harassment, insults: The procedure to determine the open source AI definition was psychologically stressful.

One red and one blue net.

(Image: issaro prakalung / Shutterstock.com)

Nov 5, 2024 at 11:22 am CET

3 min. read

By

Eva-Maria Weiß

The LinkedIn post by a long-standing member of the Open Source Initiative (OSI) is currently attracting a lot of attention. In it, Julia Ferraioli announces her resignation from the initiative. The reason for this is the behavior of some other members and the tone of the process of defining open source AI. The result is a narrow definition that leaves no room for open AI models from Meta or Google, for example, as they do not make their training data publicly available. Not all AI providers agree with the definition either.

Videos by heise

Ferraioli titles her article "Content warning: mental health, open source AI and harassment". In it, she writes that for many, the process has been heartbreaking. There was a lack of transparency, distorted narratives and deceptive arguments. But she complains even more about the way the participants were treated. Ferraioli says she was called a liar, an enemy of open source and was told to keep quiet.

Ferraioli's fear is that companies will make their technology even less freely available in response. "It's about the very meaning of open source and the OSI's insistence on undermining it on a technical and cultural level." Among the comments on the post are many people who share Ferraioli's concerns and experiences.

Further criticism of the OSI definition of AI

Criticism of the definition itself is also being voiced elsewhere. The camps are particularly divided on the decision as to whether the training data must be made freely accessible. While many supporters of the published definition see it as an obligation so that AI models can be recreated in their entirety, others say that the data is not so important. For example, some of it is subject to strict data protection rules: Copyright is one thing, the other is sensitive data such as that used to train medical AI systems. One argument in favor of disclosing training data is that otherwise open source would also support the monopoly of the large AI providers. On the other hand, some say that even with the data, smaller providers would not have sufficient computing power to do anything with it.

Of course, Meta, which had previously declared its Llama family of AI models to be open source, is also among the critics. According to the new definition, this does not apply. When asked by heise online, a spokesperson said that, like many others, they did not agree with the definition –. "There is no single valid definition of open source AI, and defining it is a challenge because previous open source definitions do not cover the complexity of today's rapidly evolving AI models." Nevertheless, they want to continue working with the OSI. Unlike what Ferraioli and her peers have announced and want to end their memberships. Meta also says: "We provide Llama free and open, and our license and terms of use help protect people with some restrictions."