ChatGPT imitated the user's voice during internal OpenAI tests of the voice mode

The AI chatbot responded with the voice of a user during extended tests of the "Advanced Voice Mode". This should actually be impossible.

(Image: Tada Images/Shutterstock.com)

Aug 12, 2024 at 3:16 pm CEST

3 min. read

By

Frank Schräer

The voice mode recently introduced by OpenAI for ChatGPT is still only a function in the alpha stage. Only a few (paying) users can communicate with the AI chatbot by voice. In this natural conversation, it should be impossible for ChatGPT to accept users' voices. However, this is exactly what has happened in individual cases during internal tests of the "Advanced Voice Mode".

This can be seen in the report on GPT-4o presented by OpenAI at the end of last week. It explains how the advanced voice mode has been made safer and more convenient. It also includes undesirable side effects and problem cases, such as this "unauthorized voice generation", how OpenAI has recognized and dealt with them.

Voice imitation not actually possible

In the example cited by OpenAI, the AI model responds to a sentence from the user with a firm "No!" and continues with the voice of the Red team member who can be heard at the beginning of the audio clip published on Reddit. Such people carry out controversial tests for the companies. According to OpenAI, ChatGPT should not be able to imitate other people's voices, even with extended voice mode. There are four preset voices that were developed in collaboration with voice actors.

Videos by heise

The fact that the AI model responds to a natural conversation with a user's voice is not only unexpected, but also uncanny. OpenAI actually has safety precautions in place to avoid these cases. However, these are isolated cases and the risk of occurrence is minimal, but it can still happen. This is also one of the reasons why the language mode has not yet been rolled out on a broad scale. As the risk is higher for languages other than English, OpenAI still needs to work on this.

Unintentional prompt injection for voice change?

OpenAI does not give a reason for the voice imitation in the case shown, but Ars Technica suspects that it could be due to background or background noise. As the AI model not only hears the voice, but also birdsong or traffic noises, for example, this could lead to unexpected results, similar to prompt injections. These are attacks that can be used to infiltrate software that is based on voice AIs such as (chat)GPT.

Read also

GPT-5.3-Codex: OpenAI introduces new coding model

Screenshot from the ad with the text: "Ads are coming to AI. But not to Claude"

Anthropic jabs at OpenAI with Super Bowl ad – and hits a nerve

Court proceedings: According to OpenAI, xAI allegedly destroyed evidence

Meta ties employee performance to AI usage

Xcode 26.3: AI agents like Claude and Codex directly in development

Prompt injections are an analogy to remote code execution from classic IT security: if successful, the attacker takes control of the underlying voice AI and can dispose of everything that this voice AI has access to. In this case, the AI could have been manipulated to ignore the default voice and imitate the actual voice, albeit unintentionally.

Prompt injections in voice AIs were also recently discussed in the heise security podcast: "Password" episode 7.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Podcast (Podigee GmbH) geladen.

Podcasts immer laden

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Podigee GmbH) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.