Chips & Hardware · Report

Nvidia demonstrates Blackwell server installations in production and announces roadmap including Blackwell Ultra, Vera CPUs, and Rubin GPUs arriving in 2026.

Confirms Blackwell entering production and establishes architecture template for industry capex cycle; successor products secure multi-year GPU roadmap.

Trade pressSlicast · August 24, 2024 · Global · Source: yahoo.com

importance 82

Prior to the Hot Chips 2024 tradeshow, Nvidia displayed elements of its Blackwell platform, including servers being installed and configured, existing Hopper H200 solutions, FP4 LLM optimizations using its Quasar Quantization System, warm water liquid cooling for data centers, and AI tools to design better chips. The company reiterated that Blackwell is more than just a GPU but an entire platform and ecosystem. Much of what Nvidia presented was already known, including its data center and AI roadmap showing Blackwell Ultra coming next year, with Vera CPUs and Rubin GPUs in 2026, followed by Vera Ultra in 2027—details first confirmed at Computex in June.

While Blackwell was reportedly delayed three months, Nvidia neither confirmed nor denied this information, instead choosing to show images of Blackwell systems being installed and providing photos and renders of internal hardware in the Blackwell GB200 racks and NVLink switches. The hardware appears capable of drawing substantial power with robust cooling, though it also looks very expensive.

Nvidia demonstrated performance results from its existing H200, showing that performance can be up to 1.5X higher on inference workloads compared to running point-to-point designs when using a Llama 3.1 70B parameter model with NVSwitch. Blackwell doubles the NVLink bandwidth to offer further improvements, with an NVLink Switch Tray offering an aggregate 14.4 TB/s of total bandwidth. To address increasing data center power requirements, Nvidia is working with partners on warm water cooling, where heated water can potentially be recirculated for heating to further reduce costs—the company claims it has seen up to a 28% reduction in data center power use using this technology.

To prepare for Blackwell's native FP4 support, Nvidia ensured its latest software benefits from new hardware features without sacrificing accuracy. Using its Quasar Quantization System to tune workload results, Nvidia can deliver basically the same quality as FP16 while using one quarter the bandwidth. Additionally, Nvidia created an internal LLM to help speed up design, debug, analysis, and optimization for circuit description in Verilog—a key factor in creating the 208 billion transistor Blackwell B200 GPU. This tool will be used to create even better models for the next generation Rubin GPUs and beyond.

Read the original