Nvidia Feynman comes with stacked GPU dies and custom HBM

Nvidia's next-but-one AI accelerator Feynman will be much smaller. Instead of side-by-side chips, they will be stacked from 2028.

Mar 17, 2026 at 9:23 pm CET

3 min. read

By

Mark Mantel

Stacked logic chips are set to become a reality in 2028. Nvidia intends to stack multiple GPU dies on top of each other for its next-but-one AI accelerator generation, Feynman. Company CEO Jensen Huang confirmed this at the opening of the company's GTC 2026 trade show (in the video from 2:12:33).

A GPU sketch on Nvidia's roadmap therefore looks significantly smaller than the next two AI accelerators, Rubin and Rubin Ultra. There, GPU dies and memory stacks are placed next to each other, with a silicon interposer establishing the data connections. Manufacturers call this construction 2.5D stacking.

Feynman follows Rubin and Rubin Ultra.

(Image: Nvidia)

Heat generation problematic

3D stacking with multiple logic chips on top of each other has advantages, especially in signal routing. However, chip manufacturers have so far been unable to solve one problem for mass production: the heat dissipation of the lower dies. The cooling solution will be particularly interesting for Feynman, as the AI accelerator could exceed 2000 watts of electrical power consumption. Nvidia has not yet commented on the details.

So far, 3D stacking has only been used with cache chiplets on a larger scale. For example, chip contract manufacturers TSMC and AMD stack CPU chiplets and Level 3 cache in their Ryzen X3D processors. In that case, the memory generates little waste heat, so cooling works. AMD is also researching more complex 3D stacking constructions.

Videos by heise

First generation with adapted HBM

In addition to the stacked design, Nvidia plans to use custom High-Bandwidth Memory (cHBM) for the first time with Feynman. This is an initiative by memory manufacturers Samsung, SK Hynix, and Micron, as well as suppliers like Marvell: With cHBM, customers like Nvidia can design their own logic for controlling the memory stacks and integrate it into their own processors or GPUs.

Previously, the logic was always located in a base die produced by the memory manufacturers and placed under the DRAM layers. The biggest disadvantage: The manufacturing technology of memory manufacturers is specialized for DRAM. If the base die transistors are moved into a CPU or GPU, a company like TSMC, with its focus on logic, can produce them. This potentially saves space and increases efficiency. Additionally, customers can adapt the cHBM control to their own needs.

In addition to Feynman, a wealth of new chips will be released in 2028: Nvidia's own ARM processor Rosa, the Bluefield-5 network processor, several switches, and the LP40 AI accelerator specialized for inference, in cooperation with Groq.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Preisvergleiche immer laden

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.