HPE: First AMD-based AI Turnkey System
HPE is launching a turnkey system featuring AMD's AI rack-scale architecture "Helios" and 72 Instinct MI455X GPUs per rack.
(Image: HPE)
- Harald Weiss
At the recent Discover customer event, HPE announced its first turnkey system based on AMD's AI rack-scale architecture "Helios" with an Ethernet scale-up network. The system utilizes specially developed Juniper Networking hardware and software, as well as Broadcom's Tomahawk-6 network chip. It is based on the open UALoE (Ultra Accelerator Link over Ethernet) standard, and the rack-scale solution, based on the Open Compute Project (OCP) ORW (Open Rack Wide) specifications, is designed for optimized energy consumption, modern liquid cooling, and easy maintenance. This offers an alternative to proprietary GPU connections like Nvidia's NVLink.
By incorporating 72 AMD Instinct MI455X GPUs per rack, the system offers an aggregated scale-up bandwidth of 260 TByte/s and up to 2.9 AI Exaflops of FP4 performance. This includes 31 Terabytes of HBM4 memory and a memory bandwidth of 1.4 PByte/s. This is complemented by a new scale-up Ethernet switch, developed in collaboration with Broadcom, which offers optimized performance for AI workloads over standard Ethernet. The switch utilizes HPE's AI-native automation and quality assurance functions to simplify network operations and aims for faster deployment and cost savings. The system is rounded off by the open-source software AMD ROCm and AMD Pensando network technology.
Also used as HPC frontend
According to HPE, the system primarily supports data traffic for training huge models with trillions of parameters and high inference throughput. Although announced as a turnkey solution, it does not address the majority of AI users – quite the opposite. "The entry-level price for Rack-Scale Helios is currently still quite high, so I think adoption will be more on the side of model and service providers," says Chris Davidson, Vice President HPC and AI at HPE. He sees the rack-scale solution as a complement to supercomputing. "The Helios rack could play an important role as a frontend system in the HPC sector," he further assesses the breadth of applications.
Videos by heise
Furthermore, the new rack is part of HPE's AI Factory. These are next-generation data centers specifically designed for AI. They serve as central AI hubs with integrated compute, storage, and network resources to ensure scalable high performance for complex AI tasks. They are globally interconnected into a large AI factory grid and offer a unified application environment.
While AI training is centralized in such AI factories or on supercomputers, inference is moving closer to the data, i.e., to the edge. HPE addresses this trend with new edge access points in the AI factory and the new NX 301 multiservice edge router. The goal is to shift more inference from the cloud to the edge to optimize latency, bandwidth, and costs.
(vbr)