Deepseek publishes multimodal model Janus-Pro

Even more competition for Silicon Valley: Deepseek from China can also keep up with Dall-E and Stable Diffusion.

listen Print view
Deepseek Bilder

Deepseek generated images in the paper

(Image: Deepseek)

2 min. read

The Chinese AI company Deepseek, which is currently shaking up the AI world and the stock market, is now also releasing an image generator or a new model from the multimodal model family called Janus. Janus-Pro should be able to compete with the Dall-E 3 image generator from OpenAI, among others.

Like the other AI models from Deepseek R1 and V3, Janus-Pro is freely available as open source under the MIT license. The new model can be found at Hugging Face, for example. Janus-Pro is the successor to Janus and is significantly larger and more powerful. The model comes with the usual capabilities of a multimodal model: it can generate images, but also understand them and should be able to remain very stringent.

Videos by heise

The images shown in the published paper are photorealistic and can compete with Midjourney in terms of quality. However, the actual use may differ. The images also show the word “Hello”. Writing is a difficult task for image generators. Janus obviously couldn't do it yet.

Deepseek describes Janus-Pro as a “novel autoregressive framework”. In some benchmarks, the version with seven billion parameters is even said to outperform Dall-E 3, Stable Diffusion XL and other image generators, the authors write.

The release of the multimodal model comes just in time for the hype surrounding the Chinese company. The R1 and V3 models have caused a stir in Silicon Valley and on the stock market. Numerous AI experts and major investor Marc Andreessen have praised the developments. As the models were trained much more cheaply and required fewer AI chips, the stock market value of Nvidia, for example, plummeted as a result.

It is not entirely clear how the Chinese company achieves the quality of the Deepseek models. There are allegations that model distillation was used and that R1 and V3 were trained using ChatGPT. Model distillation means that the knowledge of a large model is transferred to a smaller one. This is indicated by the fact that the models are said to have sometimes responded that they are ChatGPT.

(emw)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.