Nvidia H200 "Hopper" also available as a PCIe card

Nvidia now also supplies the Hopper computing accelerator in a PCI Express version with 141 GByte HBM3e and announces the Blackwell quartet GB200 NVL4.

Supermicro SuperServer SYS-522GA-NRT mit acht Nvidia H200 NVL

(Image: Supermicro)

Nov 20, 2024 at 9:50 pm CET

2 min. read

c't Magazin

By

Christof Windeck

Nvidia is launching another version of the computing accelerators from the Hopper generation announced more than two years ago: the PCIe x16 card H200 NVL. Thanks to its larger and significantly faster local memory, it is said to process large AI language models (LLMs) up to 90 percent faster than the H100 NVL announced 1.5 years ago. The computing power of the chip remains exactly the same. However, the power consumption of the PCIe card increases by 50 percent from 400 to 600 watts. Nevertheless, the H200 NVL should work more efficiently than the H100 NVL at optimum load. The power consumption can also be throttled.

Two or four H200 NVLs can be linked via NVLink at 900 GByte/s (450 GByte/s per transfer direction); with the H100 NVL, NVLink only achieves 600 GByte/s. The connection to the server mainboard is made via PCIe 5.0 x16, i.e. with up to 128 GByte/s (64 GByte/s per direction).

Supermicro presented the SuperServer SYS-522GA-NRT with eight Nvidia H200 NVL and two Intel Xeon 6900P at the SC'24 conference.

Videos by heise

Nvidia has not yet announced prices for the H200 NVL. Its predecessor, the H100 NVL, has been available for a few weeks from around 30,000 euros.

Nvidia H200: PCIe and SXM versions
Card/module	H200 SXM	H200 NVL	H100 NVL
Connection	SXM	PCIe 5.0 x16	PCIe 5.0 x16
Type	SXM	2 Slots	2 Slots
Power consumption	700 W	max. 600 W	300 – 400 W
RAM	141 GByte HMB3e	141 GByte HMB3e	94 GByte HBM3
Transfer rate	4,8 TByte/s	4,8 TByte/s	3,9 TByte/s
NVLink	0,9 TByte/s	0,9 TByte/s	0,6 TByte/s
Maximum theoretical computing power Tensor Core
Int8/FP8 with sparsity	3,958 Pops/PFlops	3,341 Pops/PFlops	3,341 Pops/PFlops
FP16 or BF16 with sparsity	1,979 PFlops	1,671 PFlops	1,671 PFlops
TF32 with Sparsity	989 TFlops	835 TFlops	835 TFlops
FP64 or FP32	67 TFlops	60 TFlops	60 TFlops
FP64 non-Tensor	34 TFlops	30 TFlops	30 TFlops
Sparsity: sparsely populated matrices

Blackwell quartet

Nvidia is also delivering the first versions of the Hopper successor Blackwell. The Grace Hopper Superchip GB200 combination processor, consisting of one CPU die (Grace, 144 ARM cores) and two B200 accelerators, is already being used in some new Top500 supercomputers.

At SC'24, Nvidia announced a new GB200 package, the GB200 NVL4. It combines four B200s with two Grace chips and is due to be delivered from the second half of next year, 2025.

Basically, a GB200 NVL4 consists of two of the GB200 NVL2s presented in June. This means there are 768 instead of 384 GBytes of fast HBM3e plus 960 instead of 480 GBytes of LPDDR5X on the Grace chips. A total of 1.7 TByte RAM is therefore available. The maximum power consumption of the GB200 NVL4 is 5.4 kW, which is why the module should be particularly useful in water-cooled systems.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Preisvergleiche immer laden

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.