Deepseek from China puts Silicon Valley under pressure

Panic mode at AI companies and on the stock market: Deepseek's Chinese AI models are cost-effective and powerful.

listen Print view
Server room in red light

Emergency in the data center

(Image: vchal/Shutterstock.com)

5 min. read

Meta is said to have set up a crisis team. Nvidia's share price goes down. OpenAI is under pressure. Deepseek from China offers AI models and a chatbot that should at least be able to compete with the current models from Silicon Valley. However, training was significantly faster and cheaper. Access to the model is also cheaper for customers.

The models were actually released a few weeks ago, but Deepseek has suddenly become the most downloaded app in the app store and is attracting a lot of attention. This could be partly because major investor Marc Andreessen from Silicon Valley is only now describing the service at X as one of the “most impressive breakthroughs” he has ever seen.

Deepseek has released the R1 and V3 models. Even V3 is said to exceed the performance of GPT-4o and Anthropic's Claude 3.5 in some benchmarks. And this is despite the fact that the development is said to have cost only a fraction. Specifically, it is said to have cost 5.6 million US dollars in pure training costs, with 2.78 million GPU hours, as they write on their own website. Meta's Llama with around 400 parameters has around eleven times as many GPU hours. Deepseek R1 stands for a reasoning model that should be able to keep up with OpenAI's o1. Both models are freely available under MIT license.

Nevertheless, it is not entirely clear how the provider was able to develop the models so cost-effectively. One problem is that Deepseek was not supposed to have access to sufficiently powerful chips for AI training due to US trade restrictions. However, there are reports that the founder bought enough Nvidia A100 GPUs for his hobby years ago, which he can now use. The Financial Times has written a short portrait of Liang Wenfeng. According to the article, he is a former hedge fund manager with a soft spot for AI. He is said to have founded Deepseek in May 2023.

However, according to the article, the company is completely cross-subsidized. Wenfeng is said to have stated that he is not pursuing any commercial interests with his AI models because basic research only has a low return on investment. Instead, he apparently wants to have an impact on the Chinese economy.

Videos by heise

However, he does not only have an impact on the Chinese economy. Deepseek seems to be causing a minor tremor in the USA in particular. The stock market values of all companies linked to AI are beginning to falter. If the models are really that powerful and require less power, there may not even be a need for 500 billion US dollar data centers – like Project Stargate. The open-source strategy also makes it possible to rebuild the models.

Several AI experts have already spoken out. Yann LeCun from Meta wrote on X that Deepseek V3 is “excellent”. Microsoft CEO Satya Nadella warned at the World Economic Forum in Davos that Chinese developments should be taken “very, very seriously”.

Last year, when Deepseek published the first versions of the models, Jim Fan from Nvidia also wrote at X that open-source models could exert enormous pressure on commercial companies. And: “Resource constraints are a beautiful thing. The survival instinct in a mercilessly competitive AI environment is a prime driver for breakthroughs.”

Perplexity CEO Aravind Srinivas suspects: “Necessity is the mother of invention. Because they had to find workarounds, they actually ended up building something much more efficient.”

However, there are also suspicions that Deepseek is not telling the whole truth when it comes to the development of the models. According to CNBC, Chetan Puttagunt from venture capitalist Benchmark has already said that Deepseek was able to use so-called model distillation. This involves transferring the knowledge of a large model into a small model. Other AI companies such as Meta are also working on this. Deepseek's chatbot is said to frequently claim that it is ChatGPT itself, which suggests that it was trained using this chatbot.

The problem is that the Deepseek models respond to some questions in the interests of the Chinese government. For example, events on Tiananmen Square are concealed. Protests by a democracy movement were bloodily ended there in 1989. The usual AI tricks are used to get the model to write about the massacre. However, if you don't know that something is being concealed, it is difficult to use tricks because you don't even know that something is missing.

(emw)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.