OpenAI introduces GPT-5.5: More agent, less chatbot
OpenAI positions GPT-5.5 as an agentic work model with top scores in coding. However, benchmarks sometimes lack comparisons to the competition.
(Image: Prathmesh T/Shutterstock.com)
Is it Thursday again already? OpenAI has introduced its next language model: GPT-5.5 is understood less as a chatbot and more as an independently working AI agent. As the company reports, the model is intended to plan tasks independently, use tools, check intermediate results, and work consistently over longer periods. GPT-5.5 thus replaces the previous flagship model, its predecessor GPT-5.4, which was only released at the beginning of March.
The focus is on software development, research, data analysis, and operating software via interfaces. Despite higher performance, the response speed per token is said to remain identical to GPT-5.4, according to the OpenAI blog. OpenAI cites optimizations in the entire infrastructure, including AI-assisted load balancing, as the reason – however, the company refrains from providing technical details on the specific implementation. Furthermore, GPT-5.5 is said to consume significantly fewer tokens for the same tasks than its predecessor.
Top scores in agentic coding
According to OpenAI, the model performs particularly well in so-called agentic coding, i.e., the independent processing of complex development tasks including planning, debugging, and tool usage. On the announcement page for GPT-5.5, OpenAI shows several results, including an earthquake tracker, two simple 3D games, and an interactive visualization of a moon mission:
Empfohlener redaktioneller Inhalt
Mit Ihrer Zustimmung wird hier ein externer Inhalt geladen.
Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.
On Terminal-Bench 2.0, a benchmark for multi-stage command-line workflows, GPT-5.5 achieves an accuracy of 82.7 percent. This puts it ahead of Claude Opus 4.7 (69.4 percent) and Gemini 3.1 Pro (68.5 percent). On the Artificial Analysis Coding Index, GPT-5.5 is said to deliver the same performance as competing models at half the cost.
Fortunately, OpenAI clearly lists all benchmarks in a table, comparing them to its own predecessors as well as Opus 4.7 and Gemini 3.1 Pro.
(Image: OpenAI)
There is also progress in desktop control via screenshots – OpenAI refers to this as "Computer Use": In the OSWorld-Verified benchmark, GPT-5.5 achieves 78.7 percent, thus narrowly ahead of Claude Opus 4.7 with 78.0 percent. Anthropic released its latest model Opus 4.7 just one week before GPT-5.5, primarily emphasizing improved instruction following.
Benchmark comparisons with gaps
A closer look at the performance data published by OpenAI reveals that comparability is limited. Several benchmarks do not include values for competing models. In the internal Expert-SWE, for example, GPT-5.5 competes exclusively against its own predecessor – external reference values are completely missing. The tables for Toolathlon and CyberGym are also incomplete.
Where external models are included, a more differentiated picture emerges. In the knowledge work benchmark GDPval, GPT-5.5 achieves the top score with 84.9 percent, but is only slightly ahead of GPT-5.4 (83.0 percent) and Claude Opus 4.7 (80.3 percent). In BrowseComp, a test for multi-stage web research, Gemini 3.1 Pro with 85.9 percent even overtakes the base model GPT-5.5 (84.4 percent) – only the Pro version pulls away with 90.1 percent. Independent tests will be needed for a reliable assessment of the actual performance.
Videos by heise
Specialized models as a strategy
GPT-5.5 joins a series of rapid releases with which OpenAI has recently differentiated its model offering. Just last week, the company introduced an improved image model with a thinking mode. A few days earlier, GPT-Rosalind was released, a model specialized in biological research. And as early as mid-April, OpenAI announced GPT-5.4-Cyber, a variant with relaxed security restrictions for verified security researchers.
Regarding security, OpenAI emphasizes the most extensive protective measures to date for GPT-5.5. Before its release, it specifically tested enhanced cybersecurity and biology capabilities, conducted internal and external red teaming, and gathered feedback from around 200 early access partners. Selected users receive extended access to security-relevant functions through a "Trusted Access" program – a concept that OpenAI had already established with GPT-5.4-Cyber.
GPT-5.5 is initially available for Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. The Pro version, GPT-5.5 Pro, is limited to Pro, Business, and Enterprise accounts. OpenAI has announced a general API release but has not yet provided a date. The company has not yet commented on pricing in Europe or GDPR compliance.
(vza)