Nvidia Blackwell Ultra wins all seven MLPerf AI training benchmarks; GB300 NVL72 achieves 10-minute Llama 405B training record.
NVIDIA has announced that its Blackwell Ultra-powered GB300 NVL72 platform has secured first position in every MLPerf training benchmark. The company claims it is the only player to have submitted results on every MLPerf test and has expanded the performance gap between itself and its rivals.
The benchmark results demonstrate significantly superior performance of the Blackwell Ultra GPUs in key training workloads. In Llama 3.1 40B pretraining, the GB300 GPUs deliver over 4X the performance versus H100 and nearly 2X versus the Blackwell GB200. Similarly, in Llama 2 70B Fine-Tuning, 8 GB300 GPUs delivered 5X the performance versus H100.
NVIDIA attributes its dominance to multiple factors, including its CUDA ecosystem, which provides substantial leverage over competitors. The rack system is paired with Quantum-X800 InfiniBand at 800 GB/s networking. The GB300 NVL72 brings 279 GB HBM3e memory per GPU and an incredible 40 TB total capacity with GPU and CPU memory combined, which accelerates AI workloads.
A key strategy enabling these performance gains is NVIDIA's adoption of FP4 precision for LLM training at every layer, which doubles the speed of calculations compared to FP8. The Blackwell Ultra further boosts this to 3X, allowing the company to achieve superior performance without increasing GPU count. For the Llama 3.1 405B benchmark, the results were achieved using 5,120 Blackwell GB200 GPUs, which took only 10 minutes to train.