Nvidia H200 "Hopper" also available as a PCIe card
Nvidia now also supplies the Hopper computing accelerator in a PCI Express version with 141 GByte HBM3e and announces the Blackwell quartet GB200 NVL4.
Supermicro SuperServer SYS-522GA-NRT mit acht Nvidia H200 NVL
(Image: Supermicro)
Nvidia is launching another version of the computing accelerators from the Hopper generation announced more than two years ago: the PCIe x16 card H200 NVL. Thanks to its larger and significantly faster local memory, it is said to process large AI language models (LLMs) up to 90 percent faster than the H100 NVL announced 1.5 years ago. The computing power of the chip remains exactly the same. However, the power consumption of the PCIe card increases by 50 percent from 400 to 600 watts. Nevertheless, the H200 NVL should work more efficiently than the H100 NVL at optimum load. The power consumption can also be throttled.
Two or four H200 NVLs can be linked via NVLink at 900 GByte/s (450 GByte/s per transfer direction); with the H100 NVL, NVLink only achieves 600 GByte/s. The connection to the server mainboard is made via PCIe 5.0 x16, i.e. with up to 128 GByte/s (64 GByte/s per direction).
Supermicro presented the SuperServer SYS-522GA-NRT with eight Nvidia H200 NVL and two Intel Xeon 6900P at the SC'24 conference.
Videos by heise
Nvidia has not yet announced prices for the H200 NVL. Its predecessor, the H100 NVL, has been available for a few weeks from around 30,000 euros.
| Nvidia H200: PCIe and SXM versions | |||
| Card/module | H200 SXM | H200 NVL | H100 NVL |
| Connection | SXM | PCIe 5.0 x16 | PCIe 5.0 x16 |
| Type | SXM | 2 Slots | 2 Slots |
| Power consumption | 700 W | max. 600 W | 300 – 400 W |
| RAM | 141 GByte HMB3e | 141 GByte HMB3e | 94 GByte HBM3 |
| Transfer rate | 4,8 TByte/s | 4,8 TByte/s | 3,9 TByte/s |
| NVLink | 0,9 TByte/s | 0,9 TByte/s | 0,6 TByte/s |
| Maximum theoretical computing power Tensor Core | |||
| Int8/FP8 with sparsity | 3,958 Pops/PFlops | 3,341 Pops/PFlops | 3,341 Pops/PFlops |
| FP16 or BF16 with sparsity | 1,979 PFlops | 1,671 PFlops | 1,671 PFlops |
| TF32 with Sparsity | 989 TFlops | 835 TFlops | 835 TFlops |
| FP64 or FP32 | 67 TFlops | 60 TFlops | 60 TFlops |
| FP64 non-Tensor | 34 TFlops | 30 TFlops | 30 TFlops |
| Sparsity: sparsely populated matrices | |||
Blackwell quartet
Nvidia is also delivering the first versions of the Hopper successor Blackwell. The Grace Hopper Superchip GB200 combination processor, consisting of one CPU die (Grace, 144 ARM cores) and two B200 accelerators, is already being used in some new Top500 supercomputers.
At SC'24, Nvidia announced a new GB200 package, the GB200 NVL4. It combines four B200s with two Grace chips and is due to be delivered from the second half of next year, 2025.
Basically, a GB200 NVL4 consists of two of the GB200 NVL2s presented in June. This means there are 768 instead of 384 GBytes of fast HBM3e plus 960 instead of 480 GBytes of LPDDR5X on the Grace chips. A total of 1.7 TByte RAM is therefore available. The maximum power consumption of the GB200 NVL4 is 5.4 kW, which is why the module should be particularly useful in water-cooled systems.
Empfohlener redaktioneller Inhalt
Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.
Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.
(ciw)