Chips & Hardware · Report

Nvidia's Blackwell GPU platform requires rework and faces shipment delays, with GB200A also being redesigned.

Blackwell delays directly compress infrastructure deployment timelines and create urgency for customers to explore alternative GPU architectures and suppliers.

Trade pressSlicast · August 4, 2024 · Global · Source: semianalysis.com

importance 86

Nvidia's Blackwell family is encountering major issues in reaching high volume production, impacting production targets for Q3/Q4 2024 and the first half of 2025. This affects Nvidia's volume and revenue. In response, Nvidia's Hopper is extended in lifespan and shipments to compensate for delays. Product timelines for Blackwell are pushed out, but volumes are affected more significantly than first shipment timelines. The technical challenges have sent Nvidia scrambling to create completely new systems not previously planned, with ramifications across dozens of downstream and upstream suppliers.

The most technically advanced chip in Nvidia's Blackwell family is the GB200, where Nvidia makes aggressive technical choices at a system level. The 72 GPU rack has a power density of approximately 125 kW per rack, despite the standard for most datacenter deployments being 12 kW to 20 kW per rack—a compute and power density never before achieved. Numerous issues have cropped up related to power delivery, overheating, water cooling supply chain ramp, water leakage from quick disconnects, and board complexity challenges. While these have sent some suppliers and designers scrambling, most are minor and not the cause of Nvidia's reduction in volumes or major roadmap rework.

The core issue impacting shipments is directly related to Nvidia's design of the Blackwell architecture and the supply of the original Blackwell package, limited due to packaging issues at TSMC. Blackwell is the first high volume design packaged with TSMC's CoWoS-L technology, which uses an RDL interposer with local silicon interconnects and bridge dies embedded in the interposer. CoWoS-L succeeds CoWoS-S because TSMC has scaled CoWoS-S to approximately 3.5x reticle sized interposers—the practical limit—due to silicon brittleness and handling difficulties with large, thin silicon interposers. Silicon bridges embedded in organic interposers can supplement signal density where organic interposers alone lack the electrical performance needed for more powerful accelerators.

CoWoS-L is much more complex technology representing the future, with Nvidia and TSMC aiming for an aggressive ramp schedule of over one million chips per quarter. Multiple fine bump pitch bridges embedded in the organic interposer cause coefficient of thermal expansion mismatch between silicon dies, bridges, organic interposer, and substrate, causing warpage. Bridge die placement requires very high levels of accuracy, especially for bridges between the two main compute dies supporting the 10 TB/s chip-to-chip interconnect. A major design issue involves bridge die redesign, and rumors indicate redesign of the top few global routing metal layers and bump of the Blackwell die, which is a primary cause of the multi-month delay. Additionally, TSMC lacks sufficient CoWoS-L capacity in aggregate; while TSMC built up extensive CoWoS-S capacity over recent years with Nvidia taking the lion's share, Nvidia's rapid demand shift to CoWoS-L has forced TSMC to build new fab AP6 for CoWoS-L and convert existing CoWoS-S capacity at AP3, making the ramp lumpy in nature.

Combining these issues, TSMC cannot supply enough Blackwell chips as Nvidia desires, so Nvidia focuses available capacity almost entirely on GB200 NVL 36x2 and NVL72 rack scale systems. HGX form-factors with the B100 and B200 are effectively now cancelled outside some initial lower volumes. To satisfy demand, Nvidia is introducing the Blackwell GPU B200A based on the B102 die, which will also be used in the China version of Blackwell, called B20. The B102 is a single monolithic compute die with 4 stacks of HBM, allowing packaging on CoWoS-S instead of CoWoS-L, or even Nvidia's other 2.5D packaging suppliers such as Amkor, ASE SPIL, and Samsung—the original Blackwell die's extensive shoreline area dedicated to C2C I/O is unnecessary in a single monolithic SOC.

Read the original