Google introduces two TPUs as alternatives to Nvidia's AI accelerators
The eighth TPU generation comes in two variants. Both exclude AMD and Intel. Google prefers to use its own ARM processors.
(Image: Google)
Google's eighth generation of Tensor Processing Units (TPUs) comes with a twist: they appear in the form of two versions optimized for training AI models and for their execution (inference), named TPU 8t and TPU 8i. Additionally, Google is using them for the first time with its own ARM processors (Axion).
Both TPU variants share some developments, such as doubling the transfer rate between chips to 19.2 Tbit/s and support for the particularly compact FP4 floating-point format. Other parts are decoupled and optimized for the respective application.
(Image:Â Google)
TPU 8i for Inference
The TPU 8i is the larger of the two AI accelerators. Google will use it to run AI agents that perform tasks for users.
A TPU 8i consists of two compute dies with the actual AI processing units, eight memory stacks of type High-Bandwidth Memory (HBM3e), and an I/O chiplet. Two additional chiplets at the upper corners might purely serve to stabilize the overall construction.
The model is trimmed for high memory throughput and low latency. The total of 288 GByte HBM3e has a combined transfer rate of 8.6 TByte/s to load data as quickly as possible. At the same time, Google uses a 384 MByte SRAM cache in the AI units to reduce latency. A new Collectives Acceleration Engine (CAE), which aggregates the results of all AI processing units, aims for the same goal.
In a server, Google bundles 8i TPUs into groups, which are then interconnected in a radix. The company calls this the boardfly topology. The Optical Circuit Switches (OCS) for connecting over 1000 chips operate via optical fibers, which is likely unique across companies to date.
An entire TPU-8i pod comprises 1152 AI accelerators and nearly 332 TByte of HBM3e RAM. Google focuses here on the FP8 and INT8 data formats; up to 11.6 FP8 exaflops are possible.
(Image:Â Google)
Training Pods with up to 9600 TPU 8t
The TPU 8t combines a single compute die with four HBM3e stacks and an I/O die. With 12.6 FP4 Petaflops, a single accelerator is about 25 percent faster than a TPU 8i. On the memory side, the TPU 8t is content with 216 GByte HBM3e and a transfer rate of just over 6.5 TByte/s. The SRAM cache shrinks to 128 MByte. So-called Sparse Cores are intended to coordinate the irregular memory accesses during AI training.
Google relies on massive scaling here: a pod can accommodate 9600 8t TPUs with a total computing power of 121 FP4 exaflops and over two Petabytes of HBM3e. The system underscores: All AI systems require massive DRAM, not just Nvidia's AI accelerators. The chips are interconnected in a mesh (3D torus topology).
(Image:Â Google)
| Accelerators | TPU 8t | TPU 8i |
| Focus | (Pre-)Training | Sampling, Serving, Reasoning |
| Network Topology | 3D Torus | Boardfly |
| Specializations | Sparse Core & LLM Decoder Engine | Collectives Acceleration Engine |
| HBM3e Capacity | 216 GByte | 288 GByte |
| SRAM Cache | 128 MByte | 384 MByte |
| Max. FP4 PFlops | 12.6 | 10.1 |
| HBM Bandwidth | 6.528 GByte/s | 8.601 GByte/s |
Numerous Partners on Board
(Image:Â Google)
8th-gen TPU systems are expected to be operational later in 2026. Apparently, chip contract manufacturer TSMC produces at least the compute dies using 2-nanometer technology. The TPU 8t, like previous generations, is said to have been co-designed with Broadcom, which is involved with AI accelerators for all cloud hyperscalers.
For the TPU 8i, Mediatek is reportedly taking the lead. A division of labor among several partners makes sense to strengthen one's own position in negotiations. Google is also reportedly negotiating with Marvell for further derivatives.
However, the hyperscaler is by no means parting ways with Nvidia. At the Google Cloud Next event, AI hardware chief Amin Vahdat emphasized that Google is among the first customers of Nvidia's AI server Vera Rubin NVL72.
(mma)