GPT-4.1 is here: OpenAI brings new language models for coding and AI agents
GPT-4.1 is OpenAI's new AI model for software development. In coding benchmarks, GPT-4.1 is behind the competition from Google and Anthropic.
(Image: Novikov Aleksey/Shutterstock.com)
OpenAI has released GPT-4.1, a new family of language models: GPT-4.1, GPT-4.1 mini and GPT-4.1 nano. Compared to the previous GPT-4o and GPT-4o mini models, the new model family is designed to output better program code and follow instructions more closely. GPT-4.1 is therefore aimed at software developers. The new model series also has a knowledge level of June 2024 and context windows with up to one million tokens, which should help to understand extensive queries of up to 750,000 words. OpenAI is only making GPT-4.1 available via the interface; there are no plans to make it available in ChatGPT.
GPT-4.1 in the benchmark behind Gemini and Claude Sonnet
OpenAI refers to the SWE-bench Verified benchmark for the revised coding capabilities. It tests language models with 500 programming tasks that people classify as solvable. According to OpenAI, the large GPT-4.1 model solved around 55 percent of the problems. This puts it behind comparable models from competitors Google and Anthropic. Gemini 2.5 Pro and Claude 3.7 Sonnet both achieved scores of around 63 percent. Deepseek V3, on the other hand, only achieved 39 percent. Compared to other OpenAI models, however, GPT-4.1 is ahead. GPT-4o as of November 2024 achieved a score of 33 percent, GPT-4.5 achieved 38 percent and OpenAI o3-mini achieved 49 percent of the tasks.
Videos by heise
OpenAI also claims that the new model family is suitable for front-end coding and that program code requires less post-processing. GPT-4.1 can also be used in the development of interfaces, where it is suitable for revising individual code blocks without replacing the entire file. For this task, GPT-4.1 achieved around 53 percent of the 225 problems across various programming languages in Aider's Polyglot benchmark, putting it behind OpenAI o1 and o3-mini, which each achieved around 60 percent. The smaller GPT-4.1 mini model solved 32 percent of the problems, putting it ahead of GPT-4o with 18 percent; the smallest model, GPT-4.1 mini, achieved six percent.
OpenAI: GPT-4.5 to make way for GPT-4.1
In order to find out how accurately language models follow the instructions entered, OpenAI developed its own internal evaluation. According to this, GPT-4.1 is on a similar level to GPT-4.1 mini, GPT-4.5, o1 and o3-mini. Although GPT-4.1 nano performed significantly worse, it delivered comparable values to GPT-4o and GPT-4o mini. In the multi-challenge benchmark, GPT-4.1 narrowly beat the smaller mini model, but is behind the reasoning models and GPT-4.5. The software company writes in its announcement that the new model family can be used to build AI agents that are helpful in real-world software development tasks. At the same time, OpenAI wants to phase out the preview of GPT-4.5.
Overall, OpenAI advertises that the large model beats GPT-4.1 in the benchmarks, while the smaller models are faster and more efficient at the expense of accuracy. GPT-4.1 is priced at two US dollars for one million input tokens and eight US dollars per million output tokens. For GPT-4.1 mini, customers pay 0.40 US dollars per million input tokens and 1.60 US dollars for the same amount of output tokens. For the smallest model, GPT-4.1 nano, OpenAI charges 0.10 US dollars for one million input tokens and 0.40 US dollars for the output tokens. The software company also recently announced that corporate customers will have to verify themselves with an ID document in order to gain API access to language models.
(sfe)