CES

DGX Vera Rubin: Nvidia shows next AI server generation with in-house CPU

Nvidia CEO Jensen Huang gave an outlook on the upcoming AI server DGX Vera Rubin with in-house ARM processor cores and new GPU architecture at CES.

listen Print view

(Image: Florian MĂĽssig / heise medien)

4 min. read
Contents

At his CES keynote, Nvidia unveiled its next-generation AI server, DGX Vera Rubin. It consists of six specific chips, all of which are already running in Nvidia's labs. However, the market launch of finished systems is not expected until the second half of 2026.

The heart of the server is Nvidia's new GPU architecture, Rubin. It follows Blackwell, which is used in all current products from GB200 and DGX Spark to GeForce RTX 5000. Nvidia CEO Jensen Huang did not provide details on the new architecture, only selected key figures: Rubin is said to process 50 Petaflops in the in-house data format NVFP4, which is a factor of 5 compared to Blackwell.

A few more details were given about the ARM processor Vera. Unlike its predecessor Grace, it does not use off-the-shelf Neoverse cores but self-developed Olympus cores. There are 88 of these in the processor, which process 176 threads in parallel. This does not use conventional SMT, but something Nvidia calls “Spatial Multi-Threading.” According to reports, incoming threads are simply distributed alternately to the internal ports. Since Nvidia has been working on CPU internals for open-source compilers since 2025, it is known that Vera will support the ARM V9.2 instruction set. According to our information, official certification from IP guardian ARM is still pending.

(Image: Florian Müssig / heise medien)

On average, Vera Rubin is expected to deliver the same computing power as its predecessor GB200 (Grace Blackwell) with a quarter of the GPUs; the cost per token is even said to be only one-seventh. Nvidia is likely to save more in-depth details about Vera and Rubin and their concrete implementations for presentations at its in-house trade fair GTC, which is scheduled for March.

The AI server DGX Vera Rubin includes more than just these two chips. They are interconnected via NVLink 6, with an NVLink module containing four such switch chips to connect dozens of GPUs. In the intended cluster, which is called NVL72 as before (Nvidia has abandoned the previously planned new naming as NVL144), 72 GPUs are finally connected together into a common computing unit.

The new in-house network cards Connect-X 9, Bluefield 4, or Spectrum-X are responsible for external connections. The latter uses silicon photonics, i.e., an optical fiber connection directly to the die of the network chip.

(Image: Florian Müssig / heise medien)

With DGX Vera Rubin as the third NVL72 configuration, Nvidia not only wants to increase computing power but also make the work of technicians in data centers easier. To achieve this, Nvidia completely dispenses with cables that would be in the way during maintenance. Moreover, faulty components can be replaced during operation while the system continues to run. All of this leads to massive time savings in case of errors: According to Nvidia, an NVLink tray can now be replaced in just six minutes – the same work took 100 minutes on the predecessor.

Videos by heise

Finally, DGX Vera Rubin addresses the problem that the context in which AI models run in practical use (inferencing) is constantly growing, and the bandwidth to the storage subsystem has long been a bottleneck. The new DGX generation therefore includes an intermediate layer with the aptly named Inferencing Context Memory Storage Platform, which connects to the compute nodes particularly rapidly via Spectrum-X SSDs. This allows the actual compute units to access required data up to 20 times faster.

On the software side, Jensen Huang announced new open-source models. New versions are coming for Nemotron, Cosmos, and Groot, and with Alpamayo, a completely new model. Alpamayo is a reasoning model for autonomous vehicles, enabling them to handle unexpected situations according to Level 4 definition for which they have not been explicitly trained. As an example, Nvidia cited the failure of traffic lights – just such a thing recently paralyzed Waymo's autonomous fleet in San Francisco.

(Image: Florian Müssig / heise medien)

heise medien is an official media partner of CES 2026.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(mue)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.