Google introduces two TPUs as alternatives to Nvidia's AI accelerators

The eighth TPU generation comes in two variants. Both exclude AMD and Intel. Google prefers to use its own ARM processors.

listen Print view
Close-up of a TPU 8i on a mainboard

(Image: Google)

4 min. read
Contents

Google's eighth generation of Tensor Processing Units (TPUs) comes with a twist: they appear in the form of two versions optimized for training AI models and for their execution (inference), named TPU 8t and TPU 8i. Additionally, Google is using them for the first time with its own ARM processors (Axion).

Both TPU variants share some developments, such as doubling the transfer rate between chips to 19.2 Tbit/s and support for the particularly compact FP4 floating-point format. Other parts are decoupled and optimized for the respective application.

Left the smaller TPU 8t, right the TPU 8i.

(Image: Google)

The TPU 8i is the larger of the two AI accelerators. Google will use it to run AI agents that perform tasks for users.

A TPU 8i consists of two compute dies with the actual AI processing units, eight memory stacks of type High-Bandwidth Memory (HBM3e), and an I/O chiplet. Two additional chiplets at the upper corners might purely serve to stabilize the overall construction.

The model is trimmed for high memory throughput and low latency. The total of 288 GByte HBM3e has a combined transfer rate of 8.6 TByte/s to load data as quickly as possible. At the same time, Google uses a 384 MByte SRAM cache in the AI units to reduce latency. A new Collectives Acceleration Engine (CAE), which aggregates the results of all AI processing units, aims for the same goal.

In a server, Google bundles 8i TPUs into groups, which are then interconnected in a radix. The company calls this the boardfly topology. The Optical Circuit Switches (OCS) for connecting over 1000 chips operate via optical fibers, which is likely unique across companies to date.

An entire TPU-8i pod comprises 1152 AI accelerators and nearly 332 TByte of HBM3e RAM. Google focuses here on the FP8 and INT8 data formats; up to 11.6 FP8 exaflops are possible.

Block diagram TPU 8i.

(Image: Google)

The TPU 8t combines a single compute die with four HBM3e stacks and an I/O die. With 12.6 FP4 Petaflops, a single accelerator is about 25 percent faster than a TPU 8i. On the memory side, the TPU 8t is content with 216 GByte HBM3e and a transfer rate of just over 6.5 TByte/s. The SRAM cache shrinks to 128 MByte. So-called Sparse Cores are intended to coordinate the irregular memory accesses during AI training.

Google relies on massive scaling here: a pod can accommodate 9600 8t TPUs with a total computing power of 121 FP4 exaflops and over two Petabytes of HBM3e. The system underscores: All AI systems require massive DRAM, not just Nvidia's AI accelerators. The chips are interconnected in a mesh (3D torus topology).

Block diagram TPU 8t.

(Image: Google)

Accelerators TPU 8t TPU 8i
Focus (Pre-)Training Sampling, Serving, Reasoning
Network Topology 3D Torus Boardfly
Specializations Sparse Core & LLM Decoder Engine Collectives Acceleration Engine
HBM3e Capacity 216 GByte 288 GByte
SRAM Cache 128 MByte 384 MByte
Max. FP4 PFlops 12.6 10.1
HBM Bandwidth 6.528 GByte/s 8.601 GByte/s

The 8th generation TPUs also require water cooling again.

(Image: Google)

8th-gen TPU systems are expected to be operational later in 2026. Apparently, chip contract manufacturer TSMC produces at least the compute dies using 2-nanometer technology. The TPU 8t, like previous generations, is said to have been co-designed with Broadcom, which is involved with AI accelerators for all cloud hyperscalers.

For the TPU 8i, Mediatek is reportedly taking the lead. A division of labor among several partners makes sense to strengthen one's own position in negotiations. Google is also reportedly negotiating with Marvell for further derivatives.

However, the hyperscaler is by no means parting ways with Nvidia. At the Google Cloud Next event, AI hardware chief Amin Vahdat emphasized that Google is among the first customers of Nvidia's AI server Vera Rubin NVL72.

(mma)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.