Laion: AI should be able to recognize fear in the voice

AI should be able to recognize emotions in order to react better to the user. Laion and Intel publish data sets, benchmarks and models.

In a video, the emotions are named using a voice from off-screen.

(Image: Screenshot Blogbeitrag Laion)

Jun 20, 2025 at 11:31 am CEST

4 min. read

By

Eva-Maria Weiß

EmoNet is an open-source suite designed to help AI systems recognize emotional signals in people's voices and facial expressions. It was jointly developed by Laion and Intel and is freely available. The suite includes models, data sets and benchmarks.

Laion is known for providing the data set on which the Stable Diffusion image generator was trained. It is an open-source initiative from Hamburg. In a blog post, Laion writes: “An exciting area of technology today is the pursuit of artificial intelligence that truly understands humans and interacts with them on a deeper level.” Although there has been enormous progress in areas such as language processing, one “crucial dimension” has not yet been realized: “true emotional intelligence”.

Videos by heise

AI recognizes fear

In the future, AI will apparently be able to recognize “the quiet trembling of fear in a voice”. Laion believes that this is not just a “fascinating academic endeavor”, but “a fundamental necessity for the future of collaboration between humans and AI”. The focus is on both voice and facial expressions.

With EmoNet-Face, Laion offers a benchmark including a database with more than 200,000 synthetic images of faces – of different origins and demographics. EmoNet-Voice is a benchmark for the recognition of voice emotions. This includes 4692 audio examples of synthetic voices. There are 40 categories for emotions. Emotions include cognitive states such as concentration, confusion, doubt, physical states such as pain, fatigue and intoxication, and social emotions such as shame and pride.

The taxonomy of emotions.

(Image: Screenshot Laion)

A video with a still image shows how sentence after sentence is spoken by a voice, with the emotion being recorded underneath. For example, the voice says it is going to a film festival. Underneath it says something like enthusiastic, interested and optimistic.

Based on its data sets, Laion has also developed its own AI model that can recognize emotions from faces and one that can recognize emotions from voices: Empathic Insight-Face Model and Empathic Insight-Voice Model.

Emotions for better AI applications

Models that can recognize emotions are not prohibited per se. However, the AI Act regulates certain applications of this technology. For example, if the ability to understand laughter is required to depict a happy person in a generated image, emotion recognition in the workplace is taboo. There are exceptions here, too, such as monitoring whether an airplane pilot is tired.

Read also

Part of the facade of the European Parliament building

Digital "Omnibus": European Parliament brakes Commission plans for data and AI

Judge with legal symbols, brown background

Preview 2026: What's changing in European and German IT law?

Board with 0s and 1s, sponge smeared paragraph sign on the board, red background, white frame with white exclamation mark as overlay

Commentary on the Digital Omnibus: Can the EU manage the van Damme split?

20,October,2019,,Reichstag,Parliament,(1894),,Apartments,And,Bundeskanzleramt,The

Federal government wants to massively revise EU AI rules

A scale with the letters AI and a drawing of a human brain on it.

SB 53: California has a new AI law

Laion aims to use emotion recognition to create better AI assistants. “Capturing expressions enables AI assistants to become more empathetic, engaged and supportive; traits that are critical for transformative applications in education, mental health, companionship and beyond.” Furthermore, the association looks forward to a future where every Foundation model can be as good at voice acting as Robert De Niro and Scarlett Johansson.

With Bud-E Whisper, Laion is also presenting an extension of OpenAI's transcription AI Whisper. This means that not only the pure linguistic content is transcribed, but also the emotional tone of voice and, if necessary, laughter or gasping for air and information about the speaker such as age and gender. To develop Bud-E Whisper, 5000 hours of public vlogs and online diaries, as well as movie dialogs, were used. Gemini Flash was responsible for annotating the emotions.