OpenAI: New Audio Models for Real-time AI Support

OpenAI releases three new audio models for the API: GPT-Realtime-2 for real-time conversations, Translate for translations, and Whisper for live transcription.

(Image: Henry Franklin/Shutterstock.com)

May 8, 2026 at 12:26 pm CEST

4 min. read

By

Malte Kirchner

Artificial intelligence will increasingly be on the other end of the line in the future when people call a support hotline or seek assistance in an app. With three new audio models available via the developer interface (API), OpenAI now wants to take their quality to a new level. Specifically, the US-based AI company has introduced the GPT-Realtime-2, GPT-Realtime Translate, and GPT-Realtime Whisper models.

As the names suggest, it's about a trio of functions: GPT-Realtime-2 is intended to enable real-time conversations between machine and human, GPT-Realtime-Translate acts as a translator in human-to-human communication, and GPT-Realtime-Whisper is used for transcription from human to machine. Furthermore, GPT-Realtime-2 is the first language model with real-time GPT-5 reasoning. OpenAI recently also introduced GPT-5.5 as an agentic work model, which is designed to plan tasks independently and process them consistently over longer periods.

AI Model Becomes More Conversational

In practical videos accompanying the announcement, OpenAI demonstrates the models in action. One focus is on the AI integrating better into human communication. For example, there's a situation where someone interrupts a human-AI conversation, and the AI is instructed to wait for the moment. The AI's responses also sound more human: whether it's how number and letter sequences are pronounced or, in live translation, that the AI waits until it has heard enough to translate meaningfully. Additionally, problems are to be communicated better instead of the communication simply failing silently.

The context window of GPT-Realtime-2 has been expanded from 32,000 to 128,000 tokens compared to the previous model, GPT-Realtime-1.5. Reasoning levels are adjustable: from minimal to very high, with the default set to low. Parallel tool calls are also possible, allowing the model to query multiple external services concurrently during an ongoing conversation. OpenAI also boasts significantly improved performance in benchmarks, such as Big Bench Audio, increasing from 81.4 to 96.6 percent compared to GPT-Realtime-1.5. In the general release of the Realtime API last year, the predecessor model had already improved this benchmark from around 65 to over 82 percent compared to the beta version.

Videos by heise

GPT-Realtime-Translate supports over 70 input languages and can translate into 13 languages. According to OpenAI, Deutsche Telekom is already testing the model for use in multilingual customer support. The cost for developers is 0.034 US dollars per minute of use.

Prices Remain the Same

GPT-Realtime Whisper is intended to enable live transcription with very low latency. Typical use cases include subtitles in meetings or streams, customer support, medical applications, and retail. The cost is 0.017 US dollars per minute.

All three models are available immediately via the Realtime API. The new models align with OpenAI's recent strategy of specialized AI models: in addition to language processing, the company has recently also introduced GPT-Rosalind for biological research, which is tailored for drug discovery and genomics. The usage of GPT-Realtime-2 costs 32 US dollars per million tokens for input (0.40 US dollars for cached tokens) and 64 US dollars per million tokens for output. This means prices remain unchanged compared to the previous model. Relevant for European developers: The Realtime API supports EU Data Residency, meaning requests and responses are processed within the EU and not stored on OpenAI's servers – however, with one caveat: Tracing, i.e., tracking API calls for debugging purposes, is currently not yet EU Data Residency compliant.