Intel Xeon 6: Full steam ahead with 128 fast cores

With up to 128 performance cores, Intel's Xeon 6900P is to close the performance gap to AMD Epyc CPUs with immediate effect. Initial measurements are optimistic

Save to Pocket listen Print view
Render image of Intel's Xeon 6 from the front and rear

The Xeon 6900P are the first processors for Intel's giant LGA7529 socket.

(Image: Intel)

8 min. read

The starting signal for Intel's Xeon 6900P server processors will be given just in time for the announced September 24th. Unlike its competitor AMD Epyc, the P in the Xeon 6 stands for the fast performance cores, not for 1-socket systems. Five models with 72 to a maximum of 128 cores are being launched and can be combined in two-socket systems. Thanks to particularly fast memory with MRDIMM technology and their Advanced Matrix Extensions (AMX), they should be better suited for AI calculations than AMD's Epyc. Together with the Gaudi 3 AI accelerators, which are also being launched on the market today, these should form the pillars of Intel's portfolio for (AI) data centers.

We were able to test a dual Xeon 6980P remotely in Intel's labs in advance and immediately set a few benchmark records for the c't lab (more on this below). But Intel's joy could be short-lived, because AMD will soon be replacing the fourth-generation Epyc processors.

With Xeon 6E, Xeon 6P and Gaudi 3, Intel essentially wants to cover the needs of the data center.

(Image: Intel)

Thanks to chiplet technology, which Intel has also been using for a few years after initial disapproval, three compute dies ("UCC") and two IO chips are now combined on the huge CPU carrier (package). This is sufficient for up to 128 performance cores per processor, after the first Xeon 6 models with efficiency cores only had the significantly weaker E-cores in the field, but still 144 on one chip and up to 288 in total.

A look under Intel's Xeon 6900P: The three chiplets in the middle contain the CPU cores, the two on the outside the I/O functions.

(Image: Intel)

The compute dies are manufactured using the "Intel 3" process. This is an improved version of Intel's last 7-nanometer Intel 4 process in terms of performance and efficiency, before finally returning to the fast lane in 2025 with Intel 18A. Two I/O dies flank the compute chiplets. Among other things, they contain 96 PCIe 5.0 lanes, 6 UPI links for connecting several processors and the four accelerators DSA, IAA, QAT and DLB familiar from the previous generations (see table). Unlike in the Xeon 6700E, all accelerators are activated in all five Xeon 6900P models.

The Xeon 6900P compute dies are manufactured using the "Intel 3" production process. If you count, you will find 44 processor cores in the XCC chip.

(Image: Intel)

Anyone wondering how you can get 128 cores with three chips: Yield optimization. Each of the three compute dies ("XCC", eXtreme Core Count) has 44 cores, of which only 43 are active in two and 42 in one. This means that Intel can also sell silicon dies with a defect in one (or even two) of the cores even in the thickest 6900P model. Incidentally, even the smallest models such as the 6960P and 6952P have three compute dies. Even if two chips were sufficient for the pure number of cores in the 6960P, too much level 3 cache and above all four of the twelve memory channels would otherwise be lost.

The Xeon 6900P only fits into the larger of the two Xeon 6 sockets LGA7529. The 6900 series can swallow up to 500 watts and does so with the exception of the 6952P (400 watts). With a maximum of twelve memory channels (DDR5-6400) and optional support for multiplexed-rank DIMMs (MRDIMM) at 8800 MT/s, the maximum memory transfer rate is a whopping 845 GBytes per second per processor. In a 2P system, this adds up to over 1.6 TByte per second.

Übersicht: Intel Xeon 6900P
Modell Kerne TDP L3-Cache Takt (Basis (Turbo Allcore / Max)
6980P 128 500 Watt 504 MByte 2,0 (3,2 / 3,9) GHz
6979P 120 500 Watt 504 MByte 2,1 (3,2 / 3,9) GHz
6972P 96 500 Watt 480 MByte 2,4 (3,5 / 3,9) GHz
6952P 96 400 Watt 480 MByte 2,1 (3,2 / 3,9) GHz
6960P 72 500 Watt 432 MByte 2,7 (3,8 / 3,9) GHz
Alle: max. 2-Fassung-Systeme, HTT, 12 x DDR5-6400/MRDIMM-8800, DLB/DSA/IAA/QAT 4/4/4/4, 6 UPI-Links, 96 x PCIe 5.0

Since there was no pre-workshop for non-US journalists, which was supposed to take place in the context of the canceled Intel Innovation 2024, Intel kindly granted us remote access to a sample system in the lab. This was equipped with two Xeon 6980P and 1.5 TByte DDR5-8800 in the form of MRDIMMs. We wanted Ubuntu Server 24.04 LTS as the operating system (and that's what we got). The hardware was therefore already close to the platform maximum.

We were allowed (almost, BMC access was not available for security reasons) to carry out a few measurements in advance to our heart's content in order to gain a first impression. In addition to some low-level measurements, we were of course particularly interested in the blazingly fast RAM.

Probably also due to the generous power budget of 500 watts per CPU, the clock rates were still 2.15 GHz even with a high AVX512 load, mostly even in the 2.3 to 2.6 GHz range. Earlier Xeon processors sometimes clocked well below 2 GHz under full load. Thanks to the two AVX512 units per core, the achievable computing power was 23.59 trillion double-precision computing steps per second for the processor duo –, i.e. just under 11.8 TFLOPS in FP64 accuracy. For comparison: The fastest AMD Epyc 9654 to date (per unit with 96 cores, 360 watts and AVX512-on-256, i.e. half the throughput) achieved just under 10 TFLOPS as a duo.

When compiling the Linux kernel 6.9.12 including modules with GCC 14, the two Xeon 6980Ps finished 27 seconds faster than the Epyc 9654 duo. The same time difference was recorded in the log file for Clang-18, but the difference was slightly greater in percentage terms, as the Epycs with the other compiler only needed 217 instead of 230 seconds. In the 3D rendering program Blender, the Xeons also calculated 132 seconds for an image of the complex scene "Lone Monk", which was significantly faster than the Epycs at 181 seconds.

Intel's Xeon also met the high expectations for memory throughput that the configuration with DDR5-8800 MRDIMMs had raised. We measured a whopping 1.21 TByte/s in the MLC benchmark with a stream-triad-like access pattern. In the all-read benchmark, the system was only just below the theoretical transfer rate of 1.69 TByte/s with around 1.51 TByte/s. The Epyc 9654, which also has 12 memory channels, achieved the second-fastest result of all the server processors we have tested so far with 704 and 745 GByte/s respectively.

We stumbled across a somewhat curious result with the 7-Zip compression program. Here, the compression throughput of 591 MByte/s was disappointingly low – For comparison: The Epyc system cracked the 700 MByte/s mark and the Xeon Platinum 8490H (2 × 60 cores) was not far behind with 520 MByte/s either. Our guess: It was due to the factory settings that Intel uses to configure the NUMA system. Instead of setting up one memory domain per CPU socket, the Xeon 6900P comes from the factory with one NUMA node per compute die – in which the four memory controllers also reside. This is good for applications that can handle NUMA systems – so-called NUMA-aware software – because the latencies are then lower.

The NUMA or clustering mode "SNC3" offers lower memory latencies compared to "Hex".

(Image: Intel)

The number crunching program y-Cruncher, for example, also achieved top results because it relies on high AVX512 and memory performance. It must be able to handle NUMA systems well by default, as interesting problem sizes often occupy hundreds of GBytes. Our test scenario reaches up to 250 billion decimal places of the circle number Pi and 100 billion of the Lemniscatic constant and occupies up to 1.09 TByte of memory.

Interesting detail: Since there was no y-cruncher version optimized for the Xeon 6900P aka Granite Rapids, we tried it out a little. It turned out that with larger amounts of decimal places, the binary optimized for AMD's Zen 5 was faster than the last one for Intel processors with AVX512.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(csp)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.