AI-based robots learn better from audio data

AI-based robots can also be trained with audio information. This improves the robot's performance, according to researchers at Stanford University.

listen Print view
Perfect audio training data can be recorded with the ManiWAV device.

The ManiWAV device can be used to record perfect video and audio information for AI robot training.

(Image: Zeyi Lucia Liu (Screenshot))

3 min. read

A team of scientists from Stanford University and the Toyota Research Institute has found that the training performance of AI-based robots is higher when audio data is used in addition to video data. This would significantly improve the speed and accuracy of the robot's learned skills.

When training AI-based robots, larger amounts of visual information are used to teach the robot certain skills. Audio data is generally not used for this, but simply ignored. In the scientific study "ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data", which is published in the preprint on Arxiv, the researchers asked themselves whether and to what extent audio information can improve the training result.

The robots receive audio information during training, which is recorded with an "ear-in-hand" data acquisition device called ManiWAV. The device can capture human demonstrations with a microphone and a camera. Audio and video are strictly synchronized. The information is then transferred to the robot during training via an interface for learning robot manipulation strategies.

To verify their assumptions, the researchers conducted four experiments with a robot in which it was supposed to learn a new skill. The first experiment required the robot to learn to flip a bagel in a frying pan using a spatula. Another experiment involved training the robot to remove a picture from a whiteboard using an eraser. The third experiment tasked the robot with learning to pour cubes from one cup into another. The last experiment trained the robot to select the right size from three different adhesive strips to connect a cable to a plastic strip.

Videos by heise

In all four cases, the researchers used the same robot, which consists of a multi-axis arm and a two-finger gripper. The activities to be learned were recorded using the ManiWAV device, which provides audio and video information. The activities were also recorded exclusively on video. They used the material to train the robot. The researchers found that the speed and accuracy of some of the tasks performed by the robot improved when additional audio information was used for training.

This applied to all tasks in which audio information proved to be helpful, such as when the cubes were shaken. The robot was able to recognize from the sound whether there were still cubes in the cup. In the task of erasing a picture, the robot was able to use the sound of the eraser to achieve the correct contact pressure.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externes YouTube-Video (Google Ireland Limited) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Google Ireland Limited) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

The audio information proved less helpful when turning the bagel over. The robot could not deduce from the sound whether the bagel had actually been flipped or not.

The scientists conclude that audio data in AI training material for robots does not always lead to improved performance. For certain training scenarios, however, it can be advantageous to use audio information in addition to a video.

(olb)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.