Nvidia publishes details of the powerful GB10 combination processor

At the Hot Chips conference, Nvidia revealed technical details of the GB10 combined processor developed with Mediatek. The launch is still open.

listen Print view
Naked Nvidia processor on a motherboard

Nvidia GB10.

(Image: heise medien)

8 min. read
Contents

Nvidia unveiled its compact DGX Spark workstation at CES at the beginning of January (at that time still under the codename Project Digits) and also provided initial details on the GB10 combined processor used in it. However, it was only at the Hot Chips conference currently taking place in Stanford, California, that the exact partitioning of the chiplet network was revealed. Chiplets do not exist in Nvidia's vocabulary: In reference to a "whole" die, the company speaks of dielets instead. There are only two of these, namely the one with the GPU (G dielet) and the one with the rest of the system-on-chip (S dielet). Both come off the production line at TSMC in an unspecified 3-nanometer process.

The GPU chiplet comes from Nvidia itself and houses a Blackwell generation graphics unit with 5th generation Tensor cores. The graphics unit is capable of DLSS 4 and ray tracing; 32 teraflops are listed in the data sheet for CUDA computing tasks. Unusual: According to the block diagrams shown, video decoder and encoder are also part of the GPU chiplet. These are usually found in the SoC part of a chiplet network so that the GPU chiplet can remain switched off when watching video.

The connection to the SoC chiplet is made using Nvidia's own NVLink-C2C, which allows up to 600 GByte/s to flow. This bandwidth is necessary as the GPU only has 24 MB of cache. All memory accesses beyond this must go through the SoC chiplet via NVLink-C2C, as the memory controllers for the 128 GByte LPDDR5X-9400 connected via 256 bits are housed there.

Videos by heise

The SoC chiplet is a commissioned work from Mediatek, in which its own function blocks (Intellectual Property, IP) are joined by those from other sources. Among other things, the NVLink C2C interface comes from Nvidia, while the twenty ARM cores come from ARM. According to Nvidia, the cores are grouped into two clusters of ten cores, each with 16 MB of shared level 3 cache. In addition, there is a 16 MB system-level cache, which corresponds to an L4 cache from a CPU perspective. What Nvidia's current slides do not say, but has been known for some time, is that the two clusters are not identical. Instead, there is one each with Cortex-X925 and Cortex-A725. This means that they are not the same cores as the GB200 server board, where Nvidia uses Neoverse cores.

Block diagram of GB10: The SoC chiplet on the left comes from Mediatek, the GPU chiplet on the right from Nvidia.

(Image: Nvidia)

The SoC contains a display controller for one HDMI and three DisplayPort outputs. The latter are designed as USB-C sockets; a USB controller is also on board. Two security controllers are implemented in GB10: One takes care of Secure Boot and other low-level functions, the other is available for UEFI and the operating system and can also serve as a firmware TPM (fTPM). Finally, there is a PCI Express controller for external system modules, which speaks PCIe 5.0. Eight such lines connect the ConnectX network chip, which can be used to connect two DGX Spark units to form a large whole in order to run even larger AI models. Further PCIe lines run on the motherboard to the M.2 SSD and the WLAN Bluetooth controller.

According to Nvidia, GB10 can consume up to 140 watts in the DGX Spark, which is distributed between the CPU and GPU cores depending on the computing load. The latter is typical for SoCs and would not raise eyebrows if it were not for an unusual implementation. As mentioned above, GB10 consists of two chiplets from two companies, which are connected via NVLink-C2C. NVLink-C2C is itself a data interface, but not a power distribution interface. The chosen solution: Although both chiplets are located on one package and logically act as one large whole, they are separate at the supply level – and therefore each require their own power supplies.

Of course, Nvidia has not revealed this unusual fact itself. And all the DGX-Spark partners don't care because they are supplied with ready-made motherboards by Nvidia. However, GB10 is to receive the closely related offshoot N1X, which is intended for gaming notebooks with up to 80 watts of thermal design power (TDP). Notebook manufacturers are therefore cursing because they have to build two power supplies on their individual motherboards, which is expensive and takes up space. Moreover, both have to be dimensioned for the peak values of 80 watts, even though in practice they will mostly be used to a rather average capacity.

The question from a conference participant as to when he could expect his Spark pre-order to be delivered caused amusement in the audience, but Nvidia still failed to provide an answer at Hot Chips. The background: Pre-orders have been possible since spring for several thousand euros, but even the previous buttery-soft information on availability was all torn. To date, neither Nvidia itself nor its partners, who want to sell identical Spark systems apart from the housing and cooling, have delivered anything. There has not even been a public demo of a running DGX Spark system.

Nvidia will not only sell the DGX Spark workstation with GB10 itself, but also through partners (back row). However, these partners can only customize the housing design and cooling system.

(Image: heise medien / Florian Müssig)

Nvidia has not officially given any reasons, but it has long been leaked from informed circles that errors in the chip required reworking. The display controller had such a big bug in the first stepping (it only spat out one screen resolution) that the developers had to go back to the digital drawing board. New exposure masks were needed, which threw the schedule back several months for every chip development. The respin for the display controller has reportedly now been completed, but according to insiders this was not the only problem area. The CPU cores are also said to have had a issue, which the engineers were ultimately able to resolve without the need for new exposure masks.

Nvidia itself only plans to use its own Ubuntu Linux derivative DGX OS for DGX Spark, but all partners also want to offer their customers Windows. And Windows support is essential for gaming notebooks with N1X, so eventually Nvidia will have to create suitable drivers. After all, Microsoft also needs to be brought on board: GB10 and N1X are the first ARM processors for Windows 11 that do not come from Qualcomm. It wouldn't be surprising if there were a few snags under the hood.

And then there is the sticking point that Windows 11, unlike Linux, does not currently support true unified memory. This is already causing problems for AMD's Strix Halo aka Ryzen AI 300 Max: The CPU and GPU physically use the same memory there, but not logically. Both have separate memory areas, between which data must be copied as always when swapping CPU and GPU. Only Microsoft itself can solve this problem.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(mue)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.