For super AI supercomputers: Cerebras' giant CPU with 4 trillion transistors

Thanks to production with 5-nanometer instead of 7-nanometer technology, the computing power of the Wafer Scale Engine WSE-3 increases to up to 126 PFlops.

Cerebras Wafer Scale Engine WSE-3

(Image: Cerebras)

May 24, 2024 at 10:40 pm CEST

3 min. read

c't Magazin

By

Christof Windeck

This article was originally published in German and has been automatically translated.

The Californian company Cerebras is following up: The giant Wafer Scale Engine processor comes in a new generation WSE-3 from TSMC's 5-nanometer production. The number of transistors - now a total of 4 instead of around 2.5 trillion in the WSE-2 - increases the computing power with similar electrical power consumption.

With the WSE-3, Cerebras is equipping the next generation of its in-house AI systems called CS-3, which can be coupled with additional memory expansions to train even larger AI models. Cerebras promises that a CS-3 cluster can train AI models with up to 24 trillion parameters.

Record processor

Like its predecessors WSE (2019) and WSE-2(2021), the WSE-3 occupies the entire usable area of a 30-centimeter silicon wafer. Cerebras uses the 4 trillion transistors not only for 900,000 AI computing cores, which are networked together rapidly, but also for 44 GB of fast SRAM.

Cerebras-Rechenmodul CS-3 mit einer WSE-3. — Cerebras computing module CS-3 with a WSE-3.

(Image: Cerebras)

Up to 2048 CS-3 systems can be linked together via fast interfaces.

Furthermore, additional memory expansions (MemoryX) with 1.5 TByte, 12 TByte or 1.2 Petabyte RAM can be connected to each CS-3 via the SwarmX interface.

According to Cerebras, a "compact cluster" of four CS-3s can optimize (tune) an AI model with 70 billion parameters in a single day. However, a single CS-2 costs around 2.5 million US dollars; Cerebras does not give exact prices publicly.

According to Cerebras, a data center with 2048 CS-3s should be able to train the generative AI language model Llama 70B in one day. Cerebras is currently building such data centers together with G42, a companyowned by Arab investor Mubadala.

Thinly populated matrices

Cerebras emphasizes a special feature of the WSEs: they are designed to automatically process sparse matrices faster. They should also benefit from "unstructured sparsity". Main competitor Nvidia also emphasizes that its AI accelerators such as the H100 "Hopper" achieve twice the computing power, for example when processing Int8 values, thanks to sparsity.

The models trained on Cerebras machines should be able to be used particularly efficiently on inferencing servers with AI computing accelerators from Qualcomm. The two companies are cooperating to this end.

You can also subscribe to c't on WhatsApp: Every weekday, we send you insights into current topics and our daily editorial work.