DeepSeek: A look behind the scenes of the Reasoning Model R1
The new DeepSeek R1 model impresses with good performance and low hardware costs. How does the model work and what does it mean for AI development?
(Image: Erstellt mit KI (MIdjourney) durch iX-Redaktion)
- Dr. Christian Winkler
OpenAI is the established market leader for language models, the entire (Gen)AI world depends on Nvidia because even better models can only be trained with vast quantities of these very expensive GPUs. Meta is planning a data center half the size of Manhattan, wants to buy a million GPUs and needs (several) gigawatts of power – on the scale of a nuclear power plant.
And then suddenly a start-up from China, which is not externally funded, comes along and presents a model that can compete with the largest OpenAI models and in some cases even beats them. The training required for this “only” amounted to 2.9 million GPU hours. One GPU hour on an H200 costs around two dollars, so it took less than six million dollars to train such a model. Allegedly, the annual salary of each of the 13 meta managers responsible for Lllama is higher.
Big tech companies are forming crisis teams because DeepSeek is publishing its model and making it available as an API for a fraction of the cost of GPT-4o{1,2,3}. Nvidia's share price falls by 20 percent and its market capitalization by 600 billion dollars, knocking Nvidia from the top spot of the most valuable companies to third place. How can such upheavals happen within a week? Practically nobody knew about DeepSeek until now. How could the entire industry be caught so cold?
Das war die Leseprobe unseres heise-Plus-Artikels "DeepSeek: A look behind the scenes of the Reasoning Model R1". Mit einem heise-Plus-Abo können Sie den ganzen Artikel lesen.