Microsoft Azure: AI accelerator Maia 200 aims to surpass Google TPU v7
Microsoft Azure's AI inference accelerator Maia 200 aims to outperform Google TPU v7 and AWS Inferentia with 10 Petaflops of FP4 compute power.
AI accelerator Microsoft Azure Maia 100, image edited.
(Image: Microsoft Azure)
The hyperscale cloud service provider Microsoft Azure announces the second generation of its in-house AI compute accelerator, the Maia 200. It processes 10 quadrillion FP4 values per second (10 Petaflops), controls 216 gigabytes of fast HBM3E memory, and can be coupled with other Maia 200 nodes at 1.4 TByte/s.
With these specifications and under 900 watts of power consumption, Maia 200 is expected to surpass the current AI accelerators from Google Cloud (TPU v7) and Amazon AWS (Trainium 3).
Videos by heise
For an AI accelerator that customers can only rent in the form of cloud instances, the price is of particular interest; Azure is not yet revealing this. However, Maia 200 is said to deliver 30 percent more performance per dollar.
Microsoft will first make Maia 200 instances available in the Azure region US Central, followed by US West 3 near Phoenix/Arizona.
Competitor Comparison
To illustrate the advantages of Maia 200, Microsoft is publishing the following table:
| Microsoft Azure Maia 200 AI Accelerator Comparison | ||||
| Provider | Microsoft Azure | Microsoft Azure | Amazon AWS | Google Cloud |
| AI Accelerator | Maia 200 | Maia 100 | Trainium 3 | TPU v7 |
| BF16 Compute Power | 1268 TFlops | 800 TFlops | 671 TFlops | 2307 TFlops |
| FP8 Compute Power | 5072 TFlops | N/A | 2517 TFlops | 4614 TFlops |
| FP4 Compute Power | 10145 TFlops | N/A | 2517 TFlops | – |
| TDP (estimated) | 880 W | 500 W | 700 W | 1000 W |
| RAM | 216 GByte HBM3E | 64 GByte HBM2E | 144 GByte HBM3E | 192 GByte HBM3E |
| RAM Transfer Rate | 7 TByte/s | 1.8 TByte/s | 4.9 TByte/s | 7.4 TByte/s |
| Interconnect | 1.4 TByte/s | 0.6 TByte/s | 1.2 TByte/s | 0.6 TByte/s |
| Manufacturing Technology | TSMC N3P | TSMC N5 | TSMC N3P | TSMC N3P |
| Chip Area | N/A | 820 mm² | N/A | N/A |
| Data from Microsoft Azure, for Maia 100: Microsoft Azure from Hot Chips 2024 | ||||
This shows that Maia 200 delivers very high compute power, especially for inferencing large AI models with FP4 weights. Power consumption remains moderate, although it's not entirely clear whether this refers only to the AI accelerator or if the High Bandwidth Memory (HBM3E) and the 28 Ethernet ports, each at 400 Gbit/s, are also included.
The comparison of Maia 200, explicitly designed for inferencing, with AWS Trainium 3 – which primarily targets training – also appears imprecise. We have added the data for Maia 100, available in Microsoft Azure since 2024.
Nvidia's current GB200 (Grace Blackwell Superchip) achieves up to 20,000 TFlops with sparsity at FP4, but it consists of two AI chips and is specified with around 1.2 kW of power.
For Huge Models
Microsoft Azure emphasizes that up to 6144 Maia 200 units can be interconnected to process very large AI models. The Microsoft Superintelligence Team is already using Maia 200 to generate synthetic data and for reinforcement learning.
Just like Amazon and Google, Microsoft does not develop its AI accelerators entirely in-house. Industry insiders assume that Microsoft pays the company Marvell as a development partner for Maia. Marvell is also said to have been involved in AWS Trainium, while Google is using Broadcom for its TPUs. The Taiwanese development service provider Alchip is also said to have developed certain chips for AWS.
(ciw)