ChatGPT imitated the user's voice during internal OpenAI tests of the voice mode

The AI chatbot responded with the voice of a user during extended tests of the "Advanced Voice Mode". This should actually be impossible.

listen Print view
ChatGPT app on a smartphone

(Image: Tada Images/Shutterstock.com)

3 min. read

The voice mode recently introduced by OpenAI for ChatGPT is still only a function in the alpha stage. Only a few (paying) users can communicate with the AI chatbot by voice. In this natural conversation, it should be impossible for ChatGPT to accept users' voices. However, this is exactly what has happened in individual cases during internal tests of the "Advanced Voice Mode".

This can be seen in the report on GPT-4o presented by OpenAI at the end of last week. It explains how the advanced voice mode has been made safer and more convenient. It also includes undesirable side effects and problem cases, such as this "unauthorized voice generation", how OpenAI has recognized and dealt with them.

In the example cited by OpenAI, the AI model responds to a sentence from the user with a firm "No!" and continues with the voice of the Red team member who can be heard at the beginning of the audio clip published on Reddit. Such people carry out controversial tests for the companies. According to OpenAI, ChatGPT should not be able to imitate other people's voices, even with extended voice mode. There are four preset voices that were developed in collaboration with voice actors.

Videos by heise

The fact that the AI model responds to a natural conversation with a user's voice is not only unexpected, but also uncanny. OpenAI actually has safety precautions in place to avoid these cases. However, these are isolated cases and the risk of occurrence is minimal, but it can still happen. This is also one of the reasons why the language mode has not yet been rolled out on a broad scale. As the risk is higher for languages other than English, OpenAI still needs to work on this.

OpenAI does not give a reason for the voice imitation in the case shown, but Ars Technica suspects that it could be due to background or background noise. As the AI model not only hears the voice, but also birdsong or traffic noises, for example, this could lead to unexpected results, similar to prompt injections. These are attacks that can be used to infiltrate software that is based on voice AIs such as (chat)GPT.

Prompt injections are an analogy to remote code execution from classic IT security: if successful, the attacker takes control of the underlying voice AI and can dispose of everything that this voice AI has access to. In this case, the AI could have been manipulated to ignore the default voice and imitate the actual voice, albeit unintentionally.

Prompt injections in voice AIs were also recently discussed in the heise security podcast: "Password" episode 7.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Podcast (Podigee GmbH) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Podigee GmbH) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(fds)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.