Model Showcase: Reasoning from China, Liquid Models, new Microsoft world

With LLMs increasingly working multimodally, there are exciting developments for more performance and leaner sizes.

listen Print view
Chatbot and humans

(Image: pncha.me / Shutterstock.com)

12 min. read
By
  • Dr. Christian Winkler
Contents

As summer begins, things are heating up in the world of language models too. New Chinese models from StepFun and MiniMax promise affordable reasoning and are optimized for agentic workflows. The Liquid Foundation Models are very compact due to their special architecture, yet still powerful.

Prof. Christian Winkler
Prof. Christian Winkler

is a data scientist and machine learning architect. He holds a PhD in theoretical physics and has been working in the field of big data and artificial intelligence for 20 years, with a particular focus on scalable systems and intelligent algorithms for mass text processing. He is professor at Nuremberg Institute of Technology since 2022. His research focuses on the optimization of user experience using modern methods. He is the founder of datanizing GmbH, a speaker at conferences and author of articles on machine learning and text analytics.

Nvidia continues its upward trend and has added some new models to its portfolio, although the largest of them is merely an announcement. Finally, at the beginning of June, Microsoft unveiled a whole series of (unfortunately closed) models at the Build conference, further emancipating itself from OpenAI.

AI Conference for Product Owners
Product Owner AI Day, online conference on July 9, 2026

(Image: popba / stock.adobe.com)

The online conference Product Owner AI Day 2026 on July 9th will show product managers how to automate processes with AI and integrate them into workflows. The workshop, which was fully booked for two dates, will take place again on July 16th. Tickets for the conference and workshop are available in the ticket shop.

The AI company StepFun, based in Shanghai, has followed up its successful Model 3.5 from the spring with a new reasoning model. It is again a Flash model with a similar architecture to the previous model, but it has been improved in several crucial points. For example, StepFun has added a vision encoder, so Step 3.7 Flash can also understand images. Reasoning can now be configured so that simple questions do not immediately accumulate a large number of tokens. This is particularly helpful for agentic use.

Like many Chinese models, Step 3.5 Flash was heavily censored. This is not much different with version 3.7, but interestingly, the model readily provides facts in the reasoning section, only to be curbed in the final answer. Guardrails certainly play a crucial role, which are trained into the model in the last step. Apart from that, the answers are mostly correct. It is particularly interesting that the reasoning for German questions mostly takes place in German, with only interruptions like “wait” being in English. This is different with almost all other models, which only reason in English.

Videos by heise

It is difficult to say whether the model is really much better than its predecessor. In any case, it has been praised in the community, especially with coding agents. On the StepFun website, one can read significantly better figures than for the older model, and it often outperforms DeepSeek V4 Flash. Eventually, we will see on the LM Arena how the model holds up in real life.

The results of Step 3.7 Flash can be found in the GitHub repository for this article.

Although MiniMax describes its M3 model as “Open Weight,” the weights are not yet downloadable on Hugging Face. Hopefully, this will change soon. The model can be tried out either directly at MiniMax.ai or via OpenRouter. As is typical for MiniMax, the results are more balanced and less censored than those of other Chinese models.

Like many providers, MiniMax has optimized the attention architecture, but it has gone its own way. Attention is calculated in two phases: the first phase decides which tokens are important and then passes them on for full attention calculation in the second phase. MiniMax claims that the M3 model can evaluate prompts almost ten times faster than MiniMax M2 and is even 15 times faster in generation. That would be a huge leap forward. Whether it proves true remains to be seen when the models can be run locally.

Publicly available benchmarks do not exist yet, but MiniMax's own data is promising. Especially in the coding domain, it can compete with the best models from Anthropic, if the data is correct.

The results of MiniMax M3 can be found in the GitHub repository for this article.

Liquid.ai takes an entirely different approach, using a different architecture for its Liquid Foundation Models. This allows tokens to be generated extremely efficiently, and these models also perform well on CPUs. Meanwhile, there are several such models, with LFM2.5-8B-A1B now joining the ranks, which has only one billion active parameters. It aims to compete with much larger models like gpt-oss-20b, Qwen3-30B-A3B-Thinking-2507, and Gemma-4-26B-A4B-IT. Apart from Gemma, however, the models used for comparison are somewhat older.

LFM2.5-8B-A1B is extremely fast: on a Mac Studio M2 Ultra, it achieved almost 200 tokens per second. The results cannot quite match the large models, but for specialized applications or agentic scenarios, the model could be suitable.

The results of LFM2.5-8B-A1B can be found in the GitHub repository for this article.

Nvidia continues its upward trend and is now showing it in its models as well. LocateAnything, among others, is popular, which allows images to be analyzed. The result is boxes indicating where specific objects are located. Processing works highly in parallel across all identified boxes; the model can even analyze scanned documents and find corresponding boxes with content. This is useful, for example, for identifying GUI elements and controlling a browser via agents. Since the model is relatively small at just under eight GB, it should also run on consumer GPUs.

The Pixel Diffusion Decoder requires significantly more memory and introduces a novel diffusion model in pixel space. Operation is currently very cumbersome: one must download various checkpoints from the Hugging Face page and process them with a specially provided program. Whether and how much better Nvidia can generate images with this compared to conventional diffusion models remains to be seen.

The Nemotron models were already powerful. However, the Nano model already has over 30 billion parameters, of which three billion are active. The Super model, released about three months ago, uses as many as 120 billion parameters, of which twelve billion are active. Now available is the Ultra model with 550 billion parameters, of which “only” 55 billion are active. Nvidia claims to achieve significantly faster inference with this, possibly due to the NVFP4 data type used in the model.

The optimized attention mechanism with many Mamba layers also contributes, enabling a context length of up to one million tokens. In terms of performance, Nemotron 3 Ultra does not quite match the open Chinese models, but the final version has only recently become available. As with all Nemotron models, Nvidia provides a large portion of the training data, training code, and other content. This makes these models by far the most open – in terms of transparency. Only the much smaller Olmo or Apertus models, not originating from Nvidia, are similarly open.

The model's Western (American) origin is clearly noticeable in its answers. Where Chinese models politely refrain, this model often expresses much clearer, politically more neutral, or at least differently colored opinions.

The results of Nemotron 3 Ultra can be found in the GitHub repository for this article.

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.