Chips & Hardware · Report

NVIDIA Blackwell platform led all categories in MLPerf Training 6.0, the industry's rigorous peer-reviewed benchmark for

NVIDIA official — first-hand confirmation of roadmap / product.

Primary · OfficialSlicast · 2026年6月22日 01:00 · US · Source: NVIDIA Blog

Hero image 16:9 · placeholder

Image / Slicast · Source: NVIDIA Blog

NVIDIA brought together performance, scale and reliability in its Blackwell platform through extreme codesign, enabling AI model builders to launch frontier models faster, minimize training costs and start generating revenue early. This round of MLPerf Training 6.0 added two new mixture-of-experts pretraining workloads reflecting the growing centrality of MoE architectures: DeepSeek-V3 671B and GPT-OSS-20B. NVIDIA was the only platform submitted across every benchmark in the suite, delivering the fastest time to train on all seven workloads.

NVIDIA submitted results on both NVIDIA GB200 NVL72 and GB300 NVL72 rack-scale systems. Within each system, fifth-generation NVIDIA NVLink Switches connect all 72 GPUs with high bandwidth into a unified pool of compute and memory, enabling them to act as one giant GPU. Large-scale MoE training requires all-to-all communication to route tokens across GPUs to reach the right expert subnetwork, and NVLink's bandwidth advantage makes that fast and efficient at scale.

NVIDIA showcased NVFP4 training methods that increase performance while meeting strict accuracy requirements across large- and small-scale pretraining as well as fine-tuning workloads. The company recently used NVFP4 to pretrain the massive 550-billion-parameter NVIDIA Nemotron 3 Ultra model. The GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72 at the same scale, driven by higher compute density with NVFP4, expanded memory capacity and a higher power ceiling sustaining peak performance.

To support distributed training at scale, NVIDIA offers two complementary scale-out networking platforms: NVIDIA Quantum InfiniBand and NVIDIA Spectrum-X Ethernet, giving data centers flexibility to build large-scale clusters optimized for their infrastructure. On DeepSeek-V3 671B, the largest MoE model in the suite, NVIDIA scaled its submission to 8,192 GPUs using GB200 NVL72 systems, the largest-scale Blackwell-based submission in MLPerf Training to date. NVIDIA also submitted results at 5,120 GPUs with NVIDIA GB200 NVL72 systems on Llama 3.1 405B, one of the largest dense LLMs in the suite.

In production training environments, runs can span weeks or months across hundreds of thousands of GPUs, and effective training throughput depends on both system performance and resiliency. NVIDIA's ecosystem partners participated extensively this round, with compelling submissions from 19 organizations including ASUSTeK, Microsoft Azure, Cisco, CoreWeave, Dell Technologies, Fujitsu, Giga Computing, Google Cloud, Hewlett Packard Enterprise, Inventec, Krai, Lambda, Nebius, Netweb Technologies India Ltd., Quanta Cloud Computing, ScitiX, Supermicro and TTA.

CoreWeave, housing NVIDIA infrastructure within Dell PowerRack systems with Dell PowerEdge servers, enabled Cohere to achieve 3x faster training on GB200 NVL72 for its North agentic AI platform. Midjourney trained its v8 image generation model on a Blackwell cluster and is now scaling a large fleet of Blackwell Ultra GPUs on CoreWeave to train upcoming image and video models. On Google Cloud, Thinking Machines Lab saw 2x faster training and serving speeds on GB300 NVL72 compared with prior-generation GPUs, accelerating frontier model research and reinforcement learning workflows. Nebius, running NVIDIA Blackwell and Blackwell Ultra infrastructure on its AI cloud, enabled Higgsfield to reduce model training time by 30%, supporting a platform that now serves 22 million users and generates over 6 million pieces of AI content per day.

Translated in full from the original. Read the original ↗