Laion: AI should be able to recognize fear in the voice
AI should be able to recognize emotions in order to react better to the user. Laion and Intel publish data sets, benchmarks and models.
In a video, the emotions are named using a voice from off-screen.
(Image: Screenshot Blogbeitrag Laion)
EmoNet is an open-source suite designed to help AI systems recognize emotional signals in people's voices and facial expressions. It was jointly developed by Laion and Intel and is freely available. The suite includes models, data sets and benchmarks.
Laion is known for providing the data set on which the Stable Diffusion image generator was trained. It is an open-source initiative from Hamburg. In a blog post, Laion writes: “An exciting area of technology today is the pursuit of artificial intelligence that truly understands humans and interacts with them on a deeper level.” Although there has been enormous progress in areas such as language processing, one “crucial dimension” has not yet been realized: “true emotional intelligence”.
Videos by heise
AI recognizes fear
In the future, AI will apparently be able to recognize “the quiet trembling of fear in a voice”. Laion believes that this is not just a “fascinating academic endeavor”, but “a fundamental necessity for the future of collaboration between humans and AI”. The focus is on both voice and facial expressions.
With EmoNet-Face, Laion offers a benchmark including a database with more than 200,000 synthetic images of faces – of different origins and demographics. EmoNet-Voice is a benchmark for the recognition of voice emotions. This includes 4692 audio examples of synthetic voices. There are 40 categories for emotions. Emotions include cognitive states such as concentration, confusion, doubt, physical states such as pain, fatigue and intoxication, and social emotions such as shame and pride.
(Image: Screenshot Laion)
A video with a still image shows how sentence after sentence is spoken by a voice, with the emotion being recorded underneath. For example, the voice says it is going to a film festival. Underneath it says something like enthusiastic, interested and optimistic.
Based on its data sets, Laion has also developed its own AI model that can recognize emotions from faces and one that can recognize emotions from voices: Empathic Insight-Face Model and Empathic Insight-Voice Model.
Emotions for better AI applications
Models that can recognize emotions are not prohibited per se. However, the AI Act regulates certain applications of this technology. For example, if the ability to understand laughter is required to depict a happy person in a generated image, emotion recognition in the workplace is taboo. There are exceptions here, too, such as monitoring whether an airplane pilot is tired.
Laion aims to use emotion recognition to create better AI assistants. “Capturing expressions enables AI assistants to become more empathetic, engaged and supportive; traits that are critical for transformative applications in education, mental health, companionship and beyond.” Furthermore, the association looks forward to a future where every Foundation model can be as good at voice acting as Robert De Niro and Scarlett Johansson.
With Bud-E Whisper, Laion is also presenting an extension of OpenAI's transcription AI Whisper. This means that not only the pure linguistic content is transcribed, but also the emotional tone of voice and, if necessary, laughter or gasping for air and information about the speaker such as age and gender. To develop Bud-E Whisper, 5000 hours of public vlogs and online diaries, as well as movie dialogs, were used. Gemini Flash was responsible for annotating the emotions.
(emw)