Language model: OpenEuroLLM aims to make AI in EU more independent and diverse

A European consortium wants to build an open source family of powerful, multilingual large language models for private and public services.

Save to Pocket listen Print view
Business team works with artificial intelligence on the computer. Rather an abstract representation.

(Image: Vasin Lee/Shutterstock.com)

4 min. read

In the global competition for powerful artificial intelligence (AI) systems, Europe is entering another project into the race with OpenEuroLLM. The project is backed by a consortium of 20 European research institutions, companies and high-performance computing centers (EuroHPC). The aim is to build a family of powerful, multilingual Large Language Models (LLMs) for commercial, industrial and public services on an open source basis.

The consortium is confident: the planned transparent and EU-compliant open source models "will democratize access to high-quality AI technologies and strengthen the competitiveness of European companies in a global market". OpenEuroLLM contributes to the EU Commission's aim of "improving Europe's competitiveness and digital sovereignty". The project is "a prime example of the kind of technology infrastructure that is needed to lower the barriers to the development and refinement of European AI products".

Of course, international competition is fierce, including for open source LLMs. Big names in this market include Meta's Llama, Google Gemma and, finally, the hyped R1 model from Chinese newcomer DeepSeek. However, OpenEuroLLM wants to score points by making not only the model code, the associated software and the evaluation completely open to everyone, but also the training data. This is not the case with competitors from the USA and China. OpenEuroLLM should therefore not only be easier to explain in terms of results, but also be better "adapted to the specific needs of industry and the public sector".

According to the plan, the new models will be trained directly in 35 languages. These are not only the languages of all EU member states and candidate countries, but also important languages of third countries such as Arabic, Chinese and Hindi. The basic AI technology should thus reflect the linguistic and cultural diversity, which can also be more easily incorporated into specific applications.

The collaborators from Germany include the Ellis Institute and the University AI Center in TĂĽbingen, Forschungszentrum JĂĽlich, the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Aleph Alpha from Heidelberg and the Bremen-based start-up Ellamind. The project is coordinated by Czech computational linguist Jan HajiÄŤ and Peter Sarlin from the Finnish AI laboratory Silo AI, which is now owned by US chip manufacturer AMD.

According to the EU Commission, OpenEuroLLM currently has a total budget of 37.4 million euros, 20.6 million of which comes from the Digital Europe funding program. Last year's tender documents spoke of 54 million euros spread over several years.

The Brussels government institution awarded the project the "Strategic Technologies for Europe" (Step) platform seal of approval on Monday. This is an initiative to increase the competitiveness of European industry through the use of critical technologies such as AI. Through Step, participants who adhere to European values of transparency and openness receive privileged access to supercomputing centers.

Compared to the 500 billion US dollars that ChatGPT developer OpenAI, together with Oracle and Softbank, wants to invest in AI data centers within four years as part of the US Stargate project, the OpenEuroLLM budget is a mere pittance. However, there are growing doubts that the parties involved do not have the money and that such high levels of funding may not even be necessary for the development of powerful LLMs. At least, the training of DeepSeek models is said to have been significantly cheaper.

OpenEuroLLM has announced close cooperation with open source and open science communities such as LAION, Open-Sci and OpenML as well as other experts in this field. The latter are already part of an advisory board for strategic partnerships. The European research project OpenGPT-X, which published the LLM Teuken-7B in November, does not mention OpenEuroLLM directly. This was trained with the 24 official languages of the EU. It is also specifically designed to meet the requirements of European values, data protection standards and linguistic diversity. However, the participant structures of both projects overlap significantly, meaning that close cooperation seems unavoidable.

(vbr)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.