OpenAI o3: Revolutionary AI model with high computing effort and price tag

The o3 model from OpenAI has shown spectacular results in initial tests. This could be expensive – There is talk of up to 1,000 US dollars per request.

(Image: Shutterstock.com /JarTee, Bearbeitung: heise online)

Jan 8, 2025 at 9:34 pm CET

4 min. read

By

Dr. Wolfgang Stieler

Ever since OpenAI presented its latest AI models o3 Mini and o3 at the end of December, the rumor mill has been buzzing. This is because the models have solved one of the currently most difficult tests for artificial intelligence from the Abstract Reasoning Corpus (ARC) – the so-called ARC test – to 85%. This is a real breakthrough, as the best programs have so far only managed around 35%.

ARC is particularly difficult for large language models, as the task is to use two examples to recognize the rules according to which abstract graphic patterns change – and then apply these rules correctly to a third pattern. However, o3 has only worked on part of the ARC puzzles so far.

And in doing so, the AI has consumed quite a lot of computing time – and caused correspondingly high costs – "thousands of US dollars" per task, as the initiators of the prize write. OpenAI has not yet published prices for o3, nor a date for the general market launch. But there is a lot of speculation on the Internet whether a subscription to the new model would then cost not just 200 dollars per month – as is currently the case with o1 –, but rather 2,000 dollars or more. Would o3 really be worth this price?

How the AI model o3 (probably) works

We can only speculate about how the AI model o3 will actually work. So far, OpenAI has not published anything about how its model works.

The only thing that is clear is that it is not simply an even larger model. For a long time, the proponents of the so-called "scaling hypothesis" – above all OpenAI – had believed that larger AI models that are trained with even more data than before will also become more powerful. However, scaling now appears to be reaching its limits. US media, citing anonymous sources at OpenAI, report that the performance leap in the next model generation, i.e. GPT5 and subsequent models, will be smaller. The same seems to apply to Google. One reason given for this is the lack of sufficient, good training data.

Videos by heise

Progress of AI models: solutions in small steps

The AI industry responded with a strategy that has become known as "test-time compute". This strategy addresses a central weakness of large language models: they always calculate the next token that matches the input, then attach the output to the front of the prompt and repeat the procedure. This works for texts, but not for complex problems where the AI tries out possible solutions step by step and has to start again if it reaches a dead end.

Models such as o3 or Gemini 2 first calculate partial solutions, the quality of which they then check internally before moving on to the next step. If such a model is given a programming task, for example, it could first break this task down into sub-problems. It then creates the code for the first sub-problem and checks whether it is executable at all. Only then does it continue. To find the best possible solution, the models follow umpteen different solution paths and then select the best one. Of course, this does not only work for programming tasks.

Expensive and unfortunately still not reliable

This would explain why these special models are so expensive, not only in training but also in operation: A query is converted internally into thousands of slightly different partial queries, but users never get to see them. According to OpenAI, o3 can also automatically adapt the computing effort to the complexity of the task at hand.

At its core, however, a large language model still works exclusively on solving the problem at hand. This means that even with o3, there is no guarantee that the solution is actually correct. There is no real, logical or mathematical verification of the solution. The model still runs the risk of hallucinating.

This article first appeared on t3n.de .