AMD Instinct MI350P: ultra-fast AI accelerator as a PCI Express card

The Instinct MI350P PCIe card, AMD's current AI accelerator with large HBM3e memory, theoretically makes it accessible to regular computers.

listen Print view
Render image of a bare Instinct MI350P

(Image: AMD)

4 min. read

AMD's Instinct MI350P for regular PCIe 5.0 slots is intended primarily for Agentic AI, meaning AI agents that can automate tasks and assist users. In addition to enormous AI computing power and high memory throughput, the card has a few other features. These include acceleration of current video codecs up to AV1 and partitioning into up to four virtual GPUs.

And although it could also run on normal computers, AMD is targeting server systems with it, which the MI350P is intended to enhance for AI suitability. The passive cooling of the approximately 26.7 cm long dual-slot card is already designed for the strong airflow of rack servers. According to AMD, with its 144 GB of HBM3e stacked memory, it is suitable for AI models with around 200 to 250 billion parameters. Workstation cards like the Radeon AI Pro 9700 with only 32 GB give up much earlier, around 40 to 50 billion parameters.

The MI350P shares its GPU with accelerators in the Open Accelerator Module (OAM) form factor, type Instinct MI350X/355X, but only 128 compute units are active in the MI350P, while 256 CUs compute in the OAM models. AMD also halves the fast HBM3e stacked memory from 288 to 144 GB. While AMD doesn't officially document this, the card's layout suggests what's likely: the MI350P uses only one I/O Die (IOD) with four compute dies (XCDs), halving the GPU package compared to its larger siblings.

The Instinct MI350P is intended to complement the OAM server boards from below and, for example, help existing rack servers make AI leaps.

(Image: AMD)

The power consumption also drops significantly, matching the nominal 600 Watt TDP of the Nvidia RTX Pro 6000 Blackwell or H200 NVL, which it is clearly intended to compete with. For power supply, AMD relies on the controversial ATX connector 12V-2x6. Alternatively, the card can be switched to a 450 Watt mode.

To serve multiple users simultaneously, there are three partitioning options: SPX, DPX, and CPX. The former corresponds to full operation, with DPX, two users share the resources (CUs, RAM, video and JPEG engines, L2 cache, and DMA engines) equally, and with CPX, there are four users. In CPX mode, two partitions compete for one video and one ten-block JPEG engine each. However, these should still have enough reserves, as the complete chip can handle 99 AV1 streams (1080p30, 4:2:0) and 4425 JPEG images per second at 1080p.

Videos by heise

AMD did not provide specific performance estimates beforehand, but the theoretical computing power – multiplied from the number of execution units and clock frequency – is 2300 teraflops with FP8 precision (densely packed matrices, doubling roughly with sparsity). MXFP4 doubles this rate to 4600 Tflops, and MXFP6 does the same, unlike Nvidia, for example. This makes the theoretical computing power slightly less than half that of an MI355X. Nvidia's H200 NVL achieves around 1670 Tflops on paper with densely packed matrices (3340 Tflops with sparsity).

AMD also provides an estimate of the actual throughput achieved, which also includes memory transfers and power consumption limitations. According to this, the Instinct MI350P operates at between 60 and 70 percent of its maximum throughput rates. The outlier at the lower end is MXFP6 at 40 percent of theoretical throughput, so the value only increases by a good third instead of doubling compared to (MX)FP8.

The theoretical and practically achievable computing power of the Instinct MI350P sometimes differs significantly. Reasons include, among others, the available electrical power as well as the necessary memory and bus transfers.

(Image: AMD)

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(csp)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.