Blackwell: Nvidia's AI accelerator design flaw fixed thanks to TSMC
The design error in Nvidia's Blackwell led to poor chip yields during production and months of delays. Now the CEO has given the all-clear.
Nvidia boss Jensen Huang with various Blackwell versions
(Image: Nvidia)
The design error in Nvidia's Blackwell GPUs for artificial intelligence computing has now been corrected, explained Nvidia CEO Jensen Huang at an event in Copenhagen yesterday (Wednesday). The problem had led to months of delays in the AI accelerators unveiled in March of this year, meaning that the chips had only been produced in small batches until now. Mass production is now being ramped up.
At the beginning of August, a design error was reported that could delay Nvidia's new Blackwell AI chips by months. During the presentation of the last annual report, when Nvidia broke 30 billion US dollars in quarterly sales, the manufacturer admitted the problems with the new Blackwell generation of AI accelerators (B100 and B200). At that time, it was already said that production yields had to be improved by changing the Blackwell GPU mask.
Nvidia is to blame and thanks TSMC
This was achieved thanks to the help of Nvidia's long-standing partner TSMC, reports Reuters. The Taiwanese contract manufacturer produces these chips. "We had a design flaw with Blackwell," explained Huang yesterday. "It worked, but the design flaw resulted in a lower yield. It was 100 percent Nvidia's fault." However, Huang described tensions between Nvidia and TSMC due to low chip yields during production as "fake news".
Videos by heise
"To make a Blackwell computer work, seven different types of chips had to be developed from scratch and put into production at the same time," the Nvidia boss added. "TSMC helped us overcome these chip yield difficulties and restart Blackwell production at an incredible pace."
Problem resolution took months
Rumors were circulating that the yield was previously below ten percent – an obscenely poor and uneconomical figure, especially considering the mature 4-nanometer process. According to this, too densely packed transistors were to blame. This fits with Nvidia's statement that TSMC has created new exposure masks. However, new masks and their validation take months, meaning that the start of mass production has been delayed accordingly.
These problems are now apparently history, as the production problems have now been resolved, according to Huang. At a recent Goldman Sachs investor conference, the Nvidia boss promised that the Blackwell chips would be delivered in the fourth quarter. Some prominent customers have already confirmed this, as Microsoft and OpenAI received the first systems with Nvidia's Blackwell at the beginning of this month.
(fds)