Nvidia H200 "Hopper" also available as a PCIe card

Nvidia now also supplies the Hopper computing accelerator in a PCI Express version with 141 GByte HBM3e and announces the Blackwell quartet GB200 NVL4.

listen Print view
Supermicro SuperServer SYS-522GA-NRT mit acht Nvidia H200 NVL

Supermicro SuperServer SYS-522GA-NRT mit acht Nvidia H200 NVL

(Image: Supermicro)

2 min. read

Nvidia is launching another version of the computing accelerators from the Hopper generation announced more than two years ago: the PCIe x16 card H200 NVL. Thanks to its larger and significantly faster local memory, it is said to process large AI language models (LLMs) up to 90 percent faster than the H100 NVL announced 1.5 years ago. The computing power of the chip remains exactly the same. However, the power consumption of the PCIe card increases by 50 percent from 400 to 600 watts. Nevertheless, the H200 NVL should work more efficiently than the H100 NVL at optimum load. The power consumption can also be throttled.

Two or four H200 NVLs can be linked via NVLink at 900 GByte/s (450 GByte/s per transfer direction); with the H100 NVL, NVLink only achieves 600 GByte/s. The connection to the server mainboard is made via PCIe 5.0 x16, i.e. with up to 128 GByte/s (64 GByte/s per direction).

Supermicro presented the SuperServer SYS-522GA-NRT with eight Nvidia H200 NVL and two Intel Xeon 6900P at the SC'24 conference.

Videos by heise

Nvidia has not yet announced prices for the H200 NVL. Its predecessor, the H100 NVL, has been available for a few weeks from around 30,000 euros.

Nvidia H200: PCIe and SXM versions
Card/module H200 SXM H200 NVL H100 NVL
Connection SXM PCIe 5.0 x16 PCIe 5.0 x16
Type SXM 2 Slots 2 Slots
Power consumption 700 W max. 600 W 300 – 400 W
RAM 141 GByte HMB3e 141 GByte HMB3e 94 GByte HBM3
Transfer rate 4,8 TByte/s 4,8 TByte/s 3,9 TByte/s
NVLink 0,9 TByte/s 0,9 TByte/s 0,6 TByte/s
Maximum theoretical computing power Tensor Core
Int8/FP8 with sparsity 3,958 Pops/PFlops 3,341 Pops/PFlops 3,341 Pops/PFlops
FP16 or BF16 with sparsity 1,979 PFlops 1,671 PFlops 1,671 PFlops
TF32 with Sparsity 989 TFlops 835 TFlops 835 TFlops
FP64 or FP32 67 TFlops 60 TFlops 60 TFlops
FP64 non-Tensor 34 TFlops 30 TFlops 30 TFlops
Sparsity: sparsely populated matrices

Nvidia is also delivering the first versions of the Hopper successor Blackwell. The Grace Hopper Superchip GB200 combination processor, consisting of one CPU die (Grace, 144 ARM cores) and two B200 accelerators, is already being used in some new Top500 supercomputers.

At SC'24, Nvidia announced a new GB200 package, the GB200 NVL4. It combines four B200s with two Grace chips and is due to be delivered from the second half of next year, 2025.

Basically, a GB200 NVL4 consists of two of the GB200 NVL2s presented in June. This means there are 768 instead of 384 GBytes of fast HBM3e plus 960 instead of 480 GBytes of LPDDR5X on the Grace chips. A total of 1.7 TByte RAM is therefore available. The maximum power consumption of the GB200 NVL4 is 5.4 kW, which is why the module should be particularly useful in water-cooled systems.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(ciw)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.