AI headset enables targeted listening to a person

Concentrating on a single voice during a conversation can be difficult in noisy environments. AI in a headset can help with this.

The University of Washington's AI headset can pick out a single voice from a jumble of conversations.

(Image: Kiyomi Taguchi / University of Washington)

May 27, 2024 at 5:51 pm CEST

3 min. read

By

Oliver Bünte

This article was originally published in German and has been automatically translated.

A team of scientists at the University of Washington (UW) has upgraded a headset with an AI-based system to enable it to listen to a person in a crowd. This works in real time, in motion and in noisy environments.

The scientists describe the headset in the study "Look Once to Hear: Target Speech Hearing with Noisy Examples", which they published in Proceedings of the CHI Conference on Human Factors in Computing Systems. It should enable the AI to listen to a person speaking for around three to five seconds to register their voice. The system, called Target Speech Hearing (TSH), would then block out all other sounds in the environment and only play back the voice of the person being listened to in real time. According to the researchers, it does not matter whether the person is moving, can still be seen or whether the environment is loud.

"With our devices, you can now hear a single speaker clearly and distinctly, even if you are in a noisy environment where many other people are talking", says Shyam Gollakota, Professor at the Paul G. Allen School of Computer Science & Engineering.

AI learns to recognize a voice

To use the headset, all a person has to do is wear it and point their head towards a person who is speaking. Then all they have to do is press a button so that the AI system can focus on the speaker and recognize them. The system is based on the sound waves of the speaker's voice reaching the microphones on both sides of the headset at the same time. An error tolerance of 16 degrees is permitted. The headset sends the captured audio signal to an integrated computer. Using machine learning, software analyzes the voice pattern of the targeted speaker and can thus remember their voice.

The system then reproduces the voice via the headphones in real time. According to the researchers, this should provide a clear result even if the speaker or listener is moving. Recognition performance improves the longer the system can listen to the registered speaker and collect training data in the process.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmmung wird hier ein externes YouTube-Video (Google Ireland Limited) geladen.

YouTube-Video immer laden

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Google Ireland Limited) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

Das Video zeigt die Funktion des "Target Speech Hearing"-Headsets.

The researchers tested the system with a total of 21 test subjects. On average, they rated the clarity of the speaker's voice as twice as good as the unfiltered audio data.

However, the researchers admit that the system still has some minor problems: it can only register a single speaker and only if there is no other loud voice coming from the same direction. However, it is possible to carry out a new registration in order to improve the sound quality, for example.

The research team now wants to use the results to apply them to hearing aids. The scientists hope that this will enable hearing-impaired people to listen to individual speakers in a more targeted manner.

Read also

${intro} ${title}