Mobile HBM: Super-fast DRAM for (Apple) smartphones with AI

"Low Power Wide I/O" memory chips with many data lines are designed to combine high data transfer rates with low power consumption.

listen Print view

LPDDR5X memory chips on a notebook mainboard

(Image: c’t Magazin, Florian MĂĽssig)

5 min. read
Contents

The global memory chip market leader Samsung and probably also SK Hynix are developing variants of low-power DDR (LPDDR) SDRAM with extremely high data transfer rates. These have many more data lines than current LPDDR5X and upcoming LPDDR6 chips. Based on High Bandwidth Memory (HBM), which is used by the fastest AI accelerators from Nvidia and AMD, for example, the new mobile memory could be called Mobile HBM.

However, names such as Low Power Wide I/O (LPW) have also emerged. At the Samsung Memory Summit 2023, there was talk of Low Latency Wide I/O (LLW).

The underlying concept is not new. More than ten years ago, the industry body JEDEC published standards for wide I/O and wide I/O2 memory chips with up to 512 data signal lines. Wide I/O DRAM was used in the PlayStation Vita mobile games console, among other things.

According to speculation, mobile HBM or LPW DRAM could be used from 2027, for example in iPhones and other smartphones with more powerful AI computing units. LPDDR6 DRAM should also be ready for the market by then. This makes it difficult to estimate the maximum data transfer rates of future LPW stacked chips.

Current LPDDR5X-8500 chips with 16 data lines transfer 17 GByte/s: 8.5 billion transfers at 2 bytes per second each. An LPW chip with a total of 256 or 512 signal lines (32 or 64 bytes), which consists internally of several stacked LPDDR5X-8500 dies, would therefore deliver up to 272 or 544 GByte/s respectively.

For comparison: An Apple M4 Pro with several LPDDR5X channels achieves 273 GByte/s, an Nvidia GeForce RTX 5060 (Ti) with GDDR7 memory achieves 448 GByte/s. High-end graphics cards achieve far more than one TByte/s.

LPDDR5X is specified up to 9.6 gigatransfers/s (GT/s), which would make over 600 GByte/s possible with LPW. Samsung would like to push "LPDDR5 Ultra Pro" to 12.7 GT/s.

LPDDR6 is to start at 10.667 GT/s, resulting in 682 GByte/s with 512 data lines. However, LPDDR6 chips should be organized in such a way that they process 24 instead of 16 bits per channel (2 sub-channels with 12 bits each). LPW based on LPDDR6 dies could therefore use 288 or 576 data lines.

LPDDR(X) memory chips for mobile devices often consist of several individual chips (dies) stacked on top of each other in a common package. The wafers with the individual dies are ground thin beforehand so that a die is only 50 micrometers (0.05 millimeters) thick, for example.

Micrograph of a NAND flash stack with 16 dies, each around 40 micrometers thick. Classic wire bonding is sufficient for the signal frequencies in a (micro) SD card.

(Image: TechInsights)

Bond wires, for example, are used to electrically connect the stacked dies to the base carrier. However, to provide many lines for very high signal frequencies, through-silicon vias (TSVs), which lead vertically through the die, are better. Several hundred TSVs fit on one square millimeter.

LPDDR DRAM dies with laterally arranged contacts are easier and cheaper to manufacture. If these are then stacked so that they each protrude slightly, each die can be coupled directly to the base die (redistribution layer, RDL) via short vertical connections. SK Hynix has developed the Vertical Fan-Out (VFO) technology for this purpose.

Construction of an LPDDR die stack with Vertical Fan-Out (VFO)

(Image: SK Hynix)

In notebooks, LPDDR memory chips are typically soldered close to the main processor on the mainboard. With LPCAMM/LPCAMM2 there is also a pluggable module version.

For mobile HBM, however, it may be necessary to stack the RAM package directly onto the processor. This is the only way to keep the many lines short enough to prevent too many errors occurring at high signal frequencies. Silicon interposers that connect the CPU SoC and LPW DRAM side by side are also conceivable.

In AI computing accelerators for servers, GPU and HBM stacks also sit on interposers. However, HBM uses 1024 data lines per stack and several stacks per GPU. Eight HBM3e stacks together deliver around 8 TByte/s.

Videos by heise

Samsung and SK Hynix are also working on a specification with which processors can control arithmetic units that are integrated into RAM chips. With processing-in-memory (PIM), memory chips could return the results of (simple) arithmetic or search operations instead of just raw data. In principle, this would save transfer performance and energy.

(ciw)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.