Blackwell GPUs confirmed in production with NVLINK upgraded to 1.4 TB/s bandwidth and new FP4 GenAI data type.
NVIDIA has addressed recent rumors about delays in Blackwell's rollout by demonstrating the chip up and running in one of its data centers during a press session ahead of Hot Chips. The company reaffirmed that Blackwell is on track for ramp and will be shipping to customers later this year, dismissing speculation about defects or market delays.
Blackwell is not a single chip but an entire platform encompassing a vast array of designs for data center, cloud, and AI customers, with each Blackwell product comprised of various chips. The generation is designed to tackle modern AI needs and deliver strong performance for large language models such as Meta's 405B Llama-3.1. As LLMs grow in size with larger parameter sizes, data centers require more compute and lower latency, necessitating a multi-GPU inference approach that splits calculations across multiple GPUs to achieve low latency and high throughput. This multi-GPU environment demands high-bandwidth GPU-to-GPU communication, as each GPU must send results to every other GPU at each layer.
NVIDIA's solution centers on the NVSwitch. Hopper NVLINK switches offer up to 1.5x higher inference throughput compared to traditional GPU-to-GPU approaches through their 900 GB/s interconnect bandwidth, requiring only one hop to the NVSwitch and then directly to the secondary GPU. With Blackwell, NVIDIA has introduced a faster NVLINK Switch that doubles the fabric bandwidth to 1.8 TB/s. This 800mm² die, based on TSMC's 4NP node, extends NVLINK to 72 GPUs in GB200 NVL72 racks, providing 7.2 TB/s of full all-to-all bidirectional bandwidth over 72 ports with an in-network compute capability of 3.6 TFLOPs. The NVLINK Switch Tray comes with two of these switches, offering up to 14.4 TB/s of total bandwidth.
New liquid-cooling solutions will be adopted by GB200, Grace Blackwell GB200, and B200 systems, with tutorials on "Liquid Cooling Boosts Performance and Efficiency" planned for Hot Chips. The warm water direct-to-chip approach offers improved cooling efficiency, lower operation cost, extended IT server life, and heat reuse possibility, delivering up to a 28% reduction in data center facility power costs. NVIDIA is also showcasing the world's first generative AI image made using FP4 compute, with the FP4-quantized model producing results very similar to FP16 models at much faster speeds through Stable Diffusion, as part of the Quasar Quantization system. The company leverages AI to build chips for AI, using generative AI to produce optimized Verilog code—a hardware description language for circuit description and processor verification—enabling speedup of next-generation chip architectures and adherence to its yearly cadence.
NVIDIA is expected to introduce Blackwell Ultra GPU next year, featuring 288 GB of HBM3e memory, increased compute density, and more AI flops, followed by Rubin and Rubin Ultra GPUs in 2026 and 2027, respectively.