Between scaremongering and brochure: OpenAI report on AI in companies

OpenAI's report on the state of AI in companies is unspecific to misleading, analyzes Philipp Steevens. Figures for informed decisions are missing.

listen Print view
Smartphone screen with ChatGPT icon

(Image: Tada Images/Shutterstock.com)

6 min. read
Contents

OpenAI's Report State of Enterprise AI has resulted in a hymn of praise for its tools, underpinned by unspecific figures and gut-feeling correlations. The user numbers are all unspecific factual data, without a starting and ending sum. It's hardly surprising that power users use the tools more intensively and frequently, but here too, concrete figures and a connection to the overall economy are missing – because the general report only refers to customers who have taken out an annual subscription. It's also unclear whether there are subscriptions without active users. Furthermore, there is no sign of measurability: users report efficiency and saved hours, but OpenAI can only measure tokens used. Concrete figures are missing at both ends. Especially when it comes to estimating a cost-benefit factor and thus making an economic decision.

A saving of more than ten working hours through ChatGPT goes hand in hand with the consumption of 1200 credits, which are not a real accounting measure in OpenAI's price tables. Here there are subscriptions, usage limits and tokens in the API. While no information could be found on the limits of the Enterprise version on OpenAI's help page, the use of GPT-5 Pro for Business is limited to 15 requests per user per month – additional purchases possible. In the report, OpenAI vaguely describes credits as an approximation of usage, with advanced features like Codex or Deep Research consuming more: "Credits map to usage, with more advanced features like Codex and Deep Research consuming an higher number of credits".

Even at the point where OpenAI specifies tokens, the exact costs cannot be determined. While input tokens are cheaper, the real costs lie in the output tokens of the models. Especially with reasoning, many of these are generated – because the entire reasoning process before the actual answer consists of output tokens. In an agent system, for example via the MCP, these tokens can also become input again, with which another model generates new output. While input and output for GPT-5.1 cost US$1.25 and US$10 per million tokens respectively, the price of GPT-5 Pro is $15 and $120 for one million input and output tokens. If companies now use 10 billion or even a trillion tokens, OpenAI's publication still does not reveal an exact price. It could be estimated and averaged, but the quantity of tokens varies depending on the field of application. For finding information, users throw many tokens into the system and expect a shorter answer; reasoning, on the other hand, can produce a lot of output from a short input.

Videos by heise

The large language models are obviously getting better. Information retrieval with RAG works well, FAQ bots relieve service hotlines, programmers can quickly generate prototypes, and MCP links different interfaces. However, a business case that can capture the entire complexity of specialist jobs is missing. In most examples, LLMs collect information from aggregated or given documents and pass the results on to specialists or customers seeking help. Already expensive employees are helped by expensive technology, or call centers are partially automated.

Companies, especially those that are already lagging behind in digitalization, cannot simply catch up with chatbots. The complex fine-tuning of a model triggers fear of missing out when a better base model comes out two months later, which is then glorified across forums, social and traditional media. If you want to be flexible, you build a tech stack and pipelines around an interface; nevertheless, LLMs cannot really be used as drop-in replacements. Each model and each model version has its characteristics and weaknesses, to which the stack then has to be adapted again. External service providers promise AI and first have to lay a foundation for digital processes.

However, the biggest problem is the measurability of efficiency gains, which is hardly given. Reports of saved hours are contrasted with the opaque business models of AI providers, who demonstrably cannot cover their costs at present and have not yet passed on the real costs of the systems to customers through subscription models or intransparent pricing. Meanwhile, AI providers are planning billions in infrastructure projects that also need to be recouped. A Cursor-Anthropic situation could occur at any time, where a provider suddenly increases prices or throttles capacity. The goal of the providers of the most widespread tools is vendor lock-in. With the strongest focus on everyday users, these are OpenAI and Google; among developers, Microsoft and Anthropic are quite far ahead. Building or expanding one's own infrastructure for operating generative AI models is also associated with massive costs, especially since, in addition to expensive AI accelerators, other hardware is currently causing high costs.

OpenAI's report wants to scare decision-makers into believing their company will lose touch with the economy. And at the same time, it glosses over reality with incomprehensible figures. Simply subscribing to ChatGPT Enterprise will not help companies catch up in digitalization, nor will burning thousands of euros on tokens and credits to save working hours.

(pst)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.