ISC gleanings: Scorn for Intel's Aurora, praise for Nvidia's Grace hopper

The Aurora supercomputer may have climbed the exaflops mark, but hardware partner Intel still came off badly at the ISC supercomputing conference.

Save to Pocket listen Print view
Renderbild eines Rechenzentrums von innen

(Image: IM Imagery/Shutterstock.com)

9 min. read
By
  • Andreas Stiller
Contents
This article was originally published in German and has been automatically translated.

The ISC High Performance trade fair in Hamburg, which focuses on supercomputers, has come to an end. Platinum sponsor Intel had probably hoped for a sunrise atmosphere, as its exaflops "sun" Aurora has finally come over the horizon after a six-year delay. However, it has obviously not developed any real radiance. If Aurora and its Intel processors and accelerators were mentioned at all at the ISC, it was mostly in a negative, often even mocking way. After all, its energy efficiency is rather embarrassing compared to AMD and Nvidia systems, at least in the 64-bit Linpack benchmark. And the once targeted 2 EFlops have been missed by a long way. There is also plenty of room for optimization in the HPCG benchmark.

The Grace Hopper systems with Nvidia processors received much more attention, especially as they could also be admired in hardware at the exhibition, mostly in servers with boards from Gigabyte.

HPE-Cray is at the forefront of all platforms. The server manufacturer proudly displayed its blades, including the brand new ones for Grace hoppers (with plastic pipes for water cooling) and models for AMD's Epyc 9004 (Genoa) as well as the newly introduced MI300 series. Also included are HPE racks from the Exascale supercomputer El Capitan, which reached 48th place in the Top500 list. El Capitan really should achieve at least 2 exaflops.

The water-cooled Grace Hopper Superchip Blade EX254n from HPE-Cray.

(Image: heise online / as)

But an Aurora blade at the event? HPE says: Ask Intel. With Intel: Ask the Leibniz Supercomputing Center. And there you can actually see at least one blade of the "small Aurora", with four instead of six Ponte Vecchio accelerators.

At least you could admire the blade of the little Aurora at the Leibniz Research Center.

(Image: heise online / as)

Six large research institutions reported on their first experiences with Grace CPUs, both with and without an attached Grace hopper, and these were unanimously positive: the University of Bristol (Isambard 3), the Los Alamos Lab (Venado), TACC (Vista), CSCS (Alps), Jülich (Jupiter) and the Japanese Joint Center for Advanced HPC (Myabi-G). According to the report, porting the software turned out to be "relatively simple". OpenMPI and other MPI stacks (except Intel) posed no problem. The Japanese also recommend OpenACC. Their list of tried and tested HPC applications consisted of 90 percent Fortran programs, most of which use double-precision floating-point format (FP64). Was Intel right in its decision to dispense with double-precision matrix engines at Ponte-Vecchio?

A single rack of the Swiss Alps with 512 Grace Hopper superchips is designed for 340 kilowatts, three times as much as Piz Daint before. Under full load, a chip draws 650 watts - and there are a few more consumers. So the chip output had to be limited to 570 watts by power capping. Speaking of saving energy: according to the CSCS, the superchips are not exactly economical in idle mode. The computer draws more than Piz Daint did previously under full load. But these valuable systems are rarely supposed to run without a computing load ...

A typical program at Los Alamos Labs. Per core, the superchip is just behind Intel's Sappire Rapids, but in the dual version it offers 144 cores instead of just 110.

(Image: heise online / as)

The Jupiter booster with Grace Hopper superchips is to become the first exascale system in Europe as part of the EuroHPC Joint Undertaking (EuroHPC JU) supercomputing initiative. With only a half-equipped rack called the Jupiter Exascale Development Instrument (Jedi), built by Eviden, the Jülich SC has already been able to show what it is all about - and has taken the lead in terms of energy efficiency. Fully equipped, a rack contains 48 nodes, each with four processors - a total of 140 racks make almost two million ARM Neoverse V2 cores. This setup is now said to be in full swing. A Jupiter cluster with 2600 European Rhea-1 processors from SiPearl is also planned (based on the older ARM Neoverse V1). But this is still a long way off, with SiPearl only saying the usual "soon".

However, other processors for high-performance computing are also being designed in Europe, such as a processor specially designed as a stencil and tensor accelerator (STX) by Fraunhofer ITWM in collaboration with ETH Zurich. A wide range of applications can benefit from this, from fluid dynamics, climate and weather forecasts to imaging processes. Four such STX processors are installed on a PCIe 5.0 card, together with 64 GBytes of high-bandwidth memory (HBM). Each one runs a small Linux. The concept is therefore somewhat similar to the NEC vector computer SX-Aurora Tsubasa. It is to be programmed conveniently via OpenMP offloading. The design is already finished, as is the PCIe card, but production of the chips (using the 12-nanometer process) is still a long way off. A marketing company called UNEEC Systems has also already been founded, with the Chief Strategy Officer of Fraunhofer IWTM, Dr. Franz-Josef Pfreundt, as CEO. The young company is still looking for investors ...

The board for the Stencel and Tensor accelerator is finished and is just waiting for the processors that are still in production

(Image: heise online / as)