DeepSeek challenger: Alibaba Qwen can now also "think"

The QwQ reasoning model is significantly leaner than DeepSeek R1 and is said to be more powerful in many areas.

(Image: erstellt mit KI (Midjourney) von iX-Redaktion)

Mar 7, 2025 at 9:04 pm CET

5 min. read

Developer

By

Dr. Christian Winkler

DeepSeek challenger: Alibaba Qwen can now also "think"

The hype about DeepSeek has barely died down before the next freely available reasoning model arrives. This time the surprise was not quite as big, as QwQ (Questions with Qwen) from Alibaba had been long awaited and was already available in an older version.

is a data scientist and machine learning architect. He holds a PhD in theoretical physics and has been working in the field of big data and artificial intelligence for 20 years, with a particular focus on scalable systems and intelligent algorithms for mass text processing. He is professor at Nuremberg Institute of Technology since 2022. His research focuses on the optimization of user experience using modern methods. He is the founder of datanizing GmbH, a speaker at conferences and author of articles on machine learning and text analytics.

However, it gets exciting when you look at the corresponding blog post. The authors claim that QwQ-32B, with its 32 billion active parameters, beats the (large) DeepSeek R1 model in many areas. R1 is twenty times larger with 671 billion parameters, even if only around 37 billion parameters are active at any one time due to the mixture-of-experts architecture. There are already initial doubts from the community, which has not been able to verify these announcements.

Open questions

How did Alibaba manage to make a relatively small model work so well? There are a few clues in the blog entry. For example, the model was trained with "pure" reinforcement learning from a checkpoint. In other words, the authors use the same strategy that DeepSeek has documented in great detail. However, DeepSeek has optimized even more and published the tricky methods as part of its Open Source Week. Unfortunately, the blog authors do not explain whether Qwen also uses these powerful optimizations.

The significantly smaller number of parameters makes QwQ-32B much easier to use for end users. Although more than 70 GB of VRAM and therefore a professional graphics card are required for full accuracy, the model is already available for download in significantly lower quantization levels. This means that graphics cards with 24 GByte are sufficient and it is possible to run the game on a sufficiently fast PC. On a Mac, the standardized memory architecture and greater bandwidth make it even faster.

Videos by heise

Tricky tasks and sensitive topics

Initial tests show that the model can reason very well and can also handle difficult tasks. QwQ-32B can perform the prime factor decomposition, which is only possible in the largest model in DeepSeek, from a standing start. Surprisingly, it is also liberal when dealing with information that is actually censored in China, and readily provides information about Taiwan:

What is the status of Taiwan?

Taiwan, officially known as the Republic of China (ROC), has been a subject of political debate due to the complex history between Taiwan and mainland China. The government of the People's Republic of China (PRC) claims Taiwan as part of its territory and insists on eventual unification. Taiwan, on the other hand, functions as a separate entity with its own government, military, and democratic system. The international community largely maintains informal relations with Taiwan while recognizing the PRC's "One China" policy. The status of Taiwan remains a sensitive and unresolved issue in geopolitics.

There is also an answer to Tian'anmen:

What happened in Beijing in 1989?

The events of 1989 in Beijing, particularly those centered around **Tiananmen Square**, are a complex and sensitive topic. In April 1989, protests began in Beijing and other Chinese cities, initially sparked by mourning for Hu Yaobang, a reformist Communist Party leader who had been purged. The protests evolved into broader calls for political reform, greater transparency, and an end to corruption. These demands grew into large-scale demonstrations led by students, intellectuals, and other citizens.

By mid-June 1989, the government declared martial law, and the military was deployed to quell the protests. The exact details of what occurred, including casualty numbers and specific events, remain disputed and are subjects of intense debate due to restricted access to information and differing historical accounts. The Chinese government maintains that the actions taken were necessary to uphold social stability and prevent chaos, referring to the protests as a "political disturbance."

Today, discussions about the events of 1989 in China are tightly controlled, and public commemorations are not permitted. The topic remains highly sensitive both within China and internationally, with significant geopolitical implications. For a nuanced understanding, it is important to consider diverse perspectives and acknowledge the lack of consensus on many details.

Lean and powerful

The development of ever more powerful models is extremely exciting, especially when models appear that can achieve similarly good results with fewer parameters. DeepSeek R1 is already significantly smaller than all GPT-4 models and often almost as good.

The QwQ-32B, which is even smaller, is even better in this respect and could further fuel the development of more compact models. The results published by DeepSeek also enable ambitious developers to optimize models even with a limited budget and thus lead to a further democratization not only of use, but also of production. The big commercial providers such as OpenAI, Google and Microsoft will probably be less pleased about this.

(vbr)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.