Ever faster: DDR6, LPDDR6, GDDR7, HBM4 and PCIe 7.0 in the works

Hardware manufacturers are gearing up for ever faster memory types and interfaces: an up-to-date overview with predictions.

Save to Pocket listen Print view
View into a server with CPU & RAM.

An Intel server with DDR5 memory bars; DDR6 is still some way off.

(Image: c’t)

10 min. read
Contents
This article was originally published in German and has been automatically translated.

The RAM types GDDR7 and LPDDR6 as well as PCI Express 6.0 herald the next generation change to even faster RAM and interfaces. Plans for HBM4, DDR6 and PCIe 7.0 are also already underway. An overview.

In all cases, the main aim is to achieve higher data transfer rates. This is because graphics processors and AI accelerators in particular benefit from fast RAM. And because AI accelerators are being used as neural processing units (NPUs) in more and more CPUs and systems-on-chip, faster RAM promises advantages. Unlike with CPU code or PC games, fast caches do not work as well with AI models several gigabytes in size because the caches are far too small for these tasks.

To achieve higher data transfer rates, changes are necessary, for example in the modulation process of the data signals. In addition, some generation changes are linked to each other. For example, future servers with PCIe 6.0 or 7.0 will also require significantly faster RAM so that the higher PCIe data transfer rate can be fully utilized – after all, the data has to flow somewhere.

In addition, the specifications of DDRx and LPDDRx of the same generation aim to be as similar as possible to save development costs.

The Double Data Rate (DDR) method transfers two data bits per clock step to increase the transfer rate. The concept reaches its limits at some point because, for example, too many transmission errors occur, sufficiently long line paths can no longer be overcome or the internal transceiver circuits (line drivers) of the chips become too complicated and expensive.

Many of the new RAM and interconnect generations therefore rely on methods that transmit more than one bit (0 or 1) per transfer, for example using pulse amplitude modulation (PAM) with three or four voltage levels (PAM3, PAM4). Additional or improved correction methods can also reduce the bit error rate (BER).

Depending on the application, different modulation and error correction methods are more suitable. DDR RAM involves many chips on modules on a common bus, which does not have very long lines, but several interference points: DIMM sockets, CPU sockets, solder contacts. LPDDR can have a higher clock rate because the DRAM chips are soldered on or sit on an optimized Compression Attached Memory Module (CAMM). For RAM, on the other hand, low latencies are more important than for PCI Express, where longer but as few lines and even cables as possible are required.

Micron has already announced GDDR7-SGRAM for graphics cards.

(Image: Micron)

GDDR7 chips could be launched as early as 2024, namely on Nvidia RTX 5000 "Blackwell" graphics cards. The JEDEC specification committee published the GDDR7 specification back in March.

While GDDR6 and GDDR6X transfer a maximum of 24 gigabits per second (Gbit/s), GDDR7 could start with 32 Gbit/s and increase to up to 48 Gbit/s over time. A single chip with 32 data signal lines (x32) would then deliver 192 GByte/s. A GPU with 512 memory lines would even reach 2 TByte per second with the first 32 Gbit/s chips. This is currently only possible with much more expensive high-bandwidth memory (HBM).

Micron is currently the only manufacturer of GDDR6X and uses PAM4 to transfer two bits per transfer. At JEDEC, however, the companies agreed on PAM3 for GDDR7. One 256-bit data word is encoded and transferred in eight consecutive transfer cycles. According to Cadence, PAM3 promises a better signal-to-noise ratio (SNR) and higher voltage tolerance than PAM4 and is therefore more resistant to interference.

Another new feature of GDDR7 is that one x32 channel can be split into four 8-bit channels. This can be advantageous if the GPU is working on different address ranges in parallel.

JEDEC is already working on LPDDR6 SDRAM, but has not yet given any dates. According to speculation, Qualcomm's Snapdragon 8 Gen 4 smartphone processor with powerful ARM cores, which is expected in the fall, could already be equipped for LPDDR6.

Some details about LPDDR6 have already been made public. Compared to the currently fastest LPDDR5X-9600 RAM, the transfer rate could increase to 10.667 to 14.4 Gbit/s (LPDDR6-10677, LPDDR6-14400). At first glance, this seems like a small increase, but at the same time, 24 bits per transfer will be used instead of 16 bits. Instead of x16 and x32 chips, x24 chips are planned, in which this wider channel can be split into two x12 subchannels. Such subchannels also exist in DDR5 DIMMs, but each is 32 bits wide.

The jump from 9.6 billion transfers at 16 bits each (9.6 GT/s × 2 bytes = 19.2 GByte/s) to 10.667 GT/s × 24 bits (32 GByte/s) would be considerable.

Low-power (LP) DDR SDRAM has long been used in notebooks as well as smartphones. With the LPCAMM2 design, pluggable LPDDRx modules are now also available, i.e. modules that can be subsequently replaced.

Some manufacturers also use LPDDR RAM to achieve particularly high RAM transfer rates through numerous channels, such as Apple with the Mx processors and Nvidia with the ARM server processor Grace.

Datentransferraten aktueller und kommender RAM- und PCIe-Typen
Geschwindigkeitsklasse Transferrate
pro Pin/Lane pro Chip/Modul/Karte und Richtung
DDR5-5600 5,60 Gbit/s 44,8 GByte/s
DDR5-7200 7,20 Gbit/s 57,6 GByte/s
MCR-DIMM (Rank Multiplexing) 4,40 Gbit/s 70,4 GByte/s
DDR5-8800 8,80 Gbit/s 70,4 GByte/s
DDR6-9600 9,60 Gbit/s 76,8 GByte/s
DDR6-17600 17,60 Gbit/s 140,8 GByte/s
DDR6-21000 (MCR?) k.A. 168,0 GByte/s
LPRDDR5X-9600 x16 9,60 Gbit/s 19,2 GByte/s
LPRDDR5X-9600 x64 (x32 ×2) 9,60 Gbit/s 76,8 GByte/s
LPDDR6-10667 x24 (x12 ×2) 10,67 Gbit/s 32,0 GByte/s
LPDDR6-10667 x96 (x24 ×4) 10,67 Gbit/s 128,0 GByte/s
GDDR6(X), x384 24,00 Gbit/s 1152,0 GByte/s
GDDR6(X), x512 24,00 Gbit/s 1536,0 GByte/s
GDDR7, x384 32,00 Gbit/s 1536,0 GByte/s
GDDR7, x512 32,00 Gbit/s 2048,0 GByte/s
GDDR7, x512 48,00 Gbit/s 3072,0 GByte/s
HBM3e, 6 Stacks @ 0,8 TByte/s k.A. 4800,0 GByte/s
HBM4, 8 Stacks @ 1,5 TByte/s k.A. 12000,0 GByte/s
PCIe 5.0 x1 32 GT/s 4,0 GByte/s
PCIe 5.0 x4 32 GT/s 16,0 GByte/s
PCIe 5.0 x16 32 GT/s 64,0 GByte/s
PCIe 6.0 x1 64 GT/s 8,0 GByte/s
PCIe 6.0 x4 64 GT/s 32,0 GByte/s
PCIe 6.0 x16 64 GT/s 128,0 GByte/s
PCIe 7.0 x1 128 GT/s 16,0 GByte/s
PCIe 7.0 x4 128 GT/s 64,0 GByte/s
PCIe 7.0 x16 128 GT/s 256,0 GByte/s
Ethernet 10G 1,2 GByte/s
Ethernet 200G 24,0 GByte/s
Ethernet 400G 48,0 GByte/s
Ethernet 800G 96,0 GByte/s
NVLink Gen 4 x18 50 GT/s 450,0 GByte/s
NVLink Gen 5 x18 100 GT/s 900,0 GByte/s

Little is yet known about DDR6 SDRAM, but preliminary work is apparently underway. A few months ago, presentation slides emerged according to which a working group is still planning a draft in 2024; a first specification should then appear in 2025. It is questionable whether DDR6-compatible processors will be available before 2027.

DDR5 has been specified by JEDEC up to DDR5-8800, and there are also DIMMs with multiplexer combined ranks (MCR DIMMs) for certain servers , which achieve even higher transfer rates. According to speculation, DDR6 could also start with 8.8 Gbit/s (DDR-8800) and initially be specified up to DDR6-17600, later also with DDR6-21000.

It has probably not yet been decided whether DDR6 will use methods such as PAM3 or PAM4. However, it seems unlikely that LPDDR6 will stick with the previous DDR signaling and instead rely on wider channels. However, it could be that DDR6 only provides for one module per channel at very high frequencies (1 DIMM per channel, 1DPC). This is also the case with MCR DIMMs. JEDEC could incorporate MCR technology into the DDR6 standard and thus transform DDR6-10500 into DDR6-21000 via rank multiplexing. The resulting data transfer rate of 168 GByte/s per channel is probably only important for special HPC servers in the long term.

At Computex 2024, Nvidia's CEO Jensen Huang announced the Vera Rubin chip generations, i.e. the ARM processor "Vera" and the accompanying AI accelerator "Rubin". The latter is to use fourth-generation high-bandwidth memory, i.e. HBM4, either eight stacks (8 stacks, Rubin) or even 12 stacks in the case of Vera Ultra.

Nvidia CEO Jensen Huang announced the AI accelerator generation "Rubin" with 8 or 12 HBM4 stacks.

(Image: c't / chh)

The (still) current Nvidia accelerator H200 "Hopper" with 144 GByte HBMe3 has six 24 GByte stacks with eight chip layers each (8-Hi HBM3e). Each stack delivers 0.8 TByte/s, so all four together deliver 4.8 TByte/s.

According to Micron, however, an HBM3e stack can achieve up to 1.2 TByte/s. With eight stacks, 9.6 TByte/s would already be possible with HBM3e; according to an older Micron roadmap, more than 1.5 TByte/s per stack is planned for HBM4 and then around 2 TByte/s for HBM4e, i.e. 25 to 66 percent more than with HBM3e. HBM4 could arrive in 2025 or more likely 2026.

The specification for PCI Express 6.0 with 64 GT/s per lane was published more than two years ago. Not only was PAM4 introduced, but also better error correction via Forward Error Correction (FEC) of defined data packets, so-called Flow Control Units (FLITs).

However, no PCIe 6.0 hardware is yet commercially available. According to speculation, Intel could introduce the Xeon 7 "Diamond Rapids" generation of server processors with PCIe 6.0 in 2025.

Due to the long delay with PCIe 4.0, the switch to PCIe 5.0 came relatively soon afterward. Now it appears that there will be several years between the generation changes. PCIe 7.0 hardware and 128 GT/s per lane would then be expected in 2027 or 2028 at the earliest.

The comparison of the data transfer rates of PCIe with RAM is flawed because a PCIe lane can actually transfer data in both directions simultaneously. However, the memory controller of the GPU or GPU can only either read or write a RAM channel. The table above therefore shows the data transfer rates per direction.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(ciw)