AMD Epyc 9005 server CPU: First test confirms high performance and efficiency

With up to 16 Zen 5 compute chiplets and 192 cores in the Epyc 9005, AMD wants to reclaim the server crown. The first c't benchmarks partially confirm this.

listen Print view
Two Epyc processors on one mainboard

(Image: c't)

8 min. read
Contents

The fifth generation of AMD's Epyc server processors, also known as Turin, has been launched with a model range of 27 different server processors. They basically fit into the SP5 version known from the predecessor, but require a BIOS update and not all models run in all existing boards due to increased thermal design power (TDP).

With the extra power, doubled AVX512 performance, increased per-MHz performance thanks to the familiar Zen 5 architecture and up to 192 CPU cores, the performance crown for servers with two processor versions should return to AMD after Intel's Xeon 6980P was able to beat the previous Epyc generation in some benchmarks.

The configuration confirms almost all the assumptions that c't made based on the preview shown at Computex: The I/O die is identical to that in the Epyc 9004, the fast "Classic" cores from TSMC's 4-nanometer production are available in a maximum 128-pack with 16 CCDs of eight CPU cores each. Everything above this is automatically the Zen 5c, manufactured by TSMC using 3-nanometer technology. Its cores are bundled in 12 CCDs of 16 cores each and have to make do with half as much level 3 cache per core. They also have lower clock rates – maximum 3.7 instead of the 4.1 to 5.0 Turbo GHz –, but are otherwise functionally identical, including the full AVX512 throughput. AMD configured this differently in the mobile APUs of the Ryzen AI 300 series and Intel does not give its e-cores the full feature set either. Compared to the first AMD generation with Zen 4c cores "Bergamo", the clock rate of the c-cores has increased by 600 MHz or around 19 percent. The L3 cache of the c-CCD dies is also no longer divided into two halves, whose connected CPUs could only communicate via the I/O die.

The GMI3 chiplet interface now has a "wide" mode, which can send 32 bytes twice per clock cycle to a CCD (read) in Epyc 9005 configurations with a maximum of eight compute dies. The write rate remains at a nominal 16 bytes per clock cycle, but can be increased to 25 bytes for very one-sided loads. In addition, the DRAM Runtime Post-Package Repair function can now also handle larger x8 DIMMs instead of just x4 bars and can decommission memory lines that are recognized as defective.

Despite the same IO dies as in Epyc 9004, there are also some new features from Epyc 9005.

(Image: AMD)

The model range extends from the 192-core for 14,813 US dollars down to the 8-core costing 527 US dollars. There are also special versions such as the frequency-optimized Epyc 9175F, which has the full 512 MByte L3 cache, but only has a single active core in each of the 16 CCDs, which can clock up to 5 GHz.

Epycs 9005 with 3D V-Cache are currently still missing from the portfolio. Among the 27 processors presented are five with the more compact Zen 5c cores, five models optimized for particularly high frequencies recognizable by the suffix "F" and four Epycs with the suffix "P", which only work alone, i.e. without a second socket. AMD hides the CPUs with Zen 5c cores somewhat in the naming scheme; these are the 9965, 9845, 9825, 9745 and 9645 – you can find all the details in the table at the end of the article.

AMD considers the Epyc 9005 to be up to 17 percent ahead of the previous generation in classic SPEC tasks, with the doubled AVX512 performance in AI and HPC, and the company claims up to 37 percent more performance. The comparisons made with the old Intel Emerald Rapids generation with a maximum of 64 cores are naturally unambiguous, as the new Epycs have up to three times as many cores. With the number of cores limited to 64, they are up to 60 percent ahead of the older CPUs.

As you may have guessed from the cover picture of this article, an Epyc 9005 reference system "Volcano" is already in the c't lab. It is equipped with 1.5 TByte DDR5-6400R memory, which is however throttled to 6000 MT/s in accordance with AMD's reference specification (see below). We have also received three processor pairs for testing, which you can read in an upcoming issue of c't and on heise online.

But there are already a few preliminary impressions here. They are clearly ahead of Intel's Xeon 6980P in terms of the theoretical computing throughput of the vector units and the very good parallelizable 3D rendering with Blender. We measured up to 23.6 TFlops for the dual Xeon with 128 P cores each in double-precision AVX512 fused multiply-add operation. The Epyc 9755 with the Classic cores achieved 26.3 TFlops with an identical TDP of 500 watts, around 11 percent more throughput. The two Epyc 9965s were around a third higher at 32.8 TFlops, proving the high possible efficiency of AMD's Zen 5 compact cores.

In the Blender scene Lone Monk, rendering ultimately took 88 seconds with the Epyc 9965. It was 100 seconds with the Epyc 9755 with 128 cores each and 132 seconds with the Xeon 6980P with 128 cores each.

Videos by heise

In terms of memory transfer rate, however, the Epycs with their twelve DDR5 6000 channels are inferior to the same number of MCDIMMs, which the Xeon operates at 8800 MT/s. In the stream triad-like measurement of the MLC 3.11a, the Epyc pairs were just under 900 GByte/s, while the Intel platform achieved just over 1200 GByte/s. The Xeon also performed better in terms of idle memory latencies.

Things are correspondingly tight in mixed workloads, where the transfer rate plays a greater role. Here AMD has to rely on the better latencies between the CPU versions, which in some cases can compensate for part of the transfer rate disadvantage.

When compiling the Linux kernel including modules (option -m) with GCC 14, the Epycs were easily ahead of the Xeons, whereby the 128 Classic cores were still ahead of the 192 compact cores (141 to 187 seconds). The Xeons needed 203 seconds. With the Clang compiler v18 it was 132, 165 and 190 seconds (in the same order). The gap between the epycs is decreasing – Clang thus seems to utilize the core flood better than GCC.

However, the Xeon 6980P is clearly ahead in the number cruncher y-Cruncher, which also wants to see a high memory transfer rate in addition to intensive AVX512 use. When measuring with 100 billion decimal places of the circle number Pi, the Epycs need 94 and 87 percent longer respectively. The lemniscatic constant, which as one of several mathematical constants can also be approximated with y-Cruncher, also shows a similar, but not quite as pronounced behavior.

While AMD presented the Epyc 9005 at Computex with DDR5-6000 support, DDR5-6400 can now be found in the same place, albeit with an asterisk. AMD is also validating the platforms for the faster DDR5 variant at the customer's request, especially as suitable JEDEC-compatible memory, i.e. in the form of non-overclocked RDIMMs, is already readily available. However, the reference specification remains at DDR5-6000, which is how our test system was configured. For the detailed test, we will try to see whether our system can cope with the higher frequency in a stable manner and how much extra performance this makes up for.

Disclaimer: AMD covered the author's travel and accommodation costs to the "Advancing AI 2024" event.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

Overview: AMD Epyc 9005 "Turin"
Modell Kerne Base-/Boost-Clock (GHz) TDP (Watt) cTDP (Watt) L3-Cache (MByte) L3-Cache/core (MByte) UVP (US-$)
9965* 192 2,25 / 3,7 500 450-500 384 2,0 14813
9845* 160 2,1 / 3,7 390 320-400 320 2,0 13564
9825* 144 2,2 / 3,7 390 320-400 384 2,7 13006
9755 128 2,7 / 4,1 500 450-500 512 4,0 12984
9745* 128 2,4 / 3,7 400 320-400 256 2,0 12141
9655 96 2,6 / 4,5 400 320-400 384 4,0 11852
9645* 96 2,3 / 3,7 320 320-400 256 2,7 11048
9565 72 3,15 / 4,3 400 320-400 384 5,3 10486
9555 64 3,2 / 4,4 360 320-400 256 4,0 9826
9535 64 2,4 / 4,3 300 240-300 256 4,0 8992
9455 48 3,15 / 4,4 300 240-300 256 5,3 5412
9365 36 3,4 / 4,3 300 240-300 192 5,3 4341
9355 32 3,55 / 4,4 280 240-300 256 8,0 3694
9335 32 3,0 / 4,4 210 200-240 128 4,0 3178
9255 24 3,25 / 4,3 200 200-240 128 5,3 2495
9135 16 3,65 / 4,3 200 200-240 64 4,0 1214
9115 16 2,6 / 4,1 125 ?-195 64 4,0 726
9015 8 3,6 / 4,1 125 ?-195 64 8,0 527
9575F 64 3,3 / 5,0 400 320-400 256 4,0 11791
9475F 48 3,65 / 4,8 400 320-400 256 5,3 7592
9375F 32 3,8 / 4,8 320 320-400 256 8,0 5306
9275F 24 4,1 / 4,8 320 320-400 256 10,7 3439
9175F 16 4,2 / 5,0 320 320-400 512 32,0 4256
9655P 96 2,6 / 4,5 400 320-400 384 4,0 10811
9555P 64 3,2 / 4,4 360 320-400 256 4,0 7983
9455P 48 3,15 / 4,4 300 240-300 256 5,3 4819
9355P 32 3,55 / 4,4 280 240-300 256 8,0 2998
* Zen5c-CCDs, F = Frequency-optimized, P = only 1-socket-mode

(csp)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.