NVIDIA H100 GPU set new AI benchmark records, establishing performance leadership.
NVIDIA's H100 Tensor Core GPU has delivered record-breaking results in the latest MLPerf v2.1 benchmarking tests. MLPerf v2.1, a product of MLCommons, provides benchmarking for machine learning models, software, and hardware and serves as the industry benchmark for deep learning, AI training, AI inference, and HPC. The specific test, MLPerf Inference v2.1, measures inference performance and how fast a system can process inputs and produce results using a trained model. The H100 is NVIDIA's ninth-generation data center GPU and compared to NVIDIA's previous generation A100 GPU, provides an order-of-magnitude greater performance for large-scale AI and HPC.
In the Data Center category, the NVIDIA H100 Tensor Core GPU delivered the highest per-accelerator performance across every workload for both the Server and Offline tests, with up to 4.5x more performance in the Offline scenario and up to 3.9x more in the Server scenario than the A100 Tensor Core GPU. NVIDIA attributes part of the superior performance of the H100 on the BERT NLP model to its Transformer Engine. The new engine, combined with NVIDIA Hopper FP8 Tensor Cores, delivers up to 9x faster AI training and 30x faster AI inference speedups on large language models as compared to the prior generation.
Speed is particularly crucial because huge AI models can have trillions of parameters and may require months to train with that amount of data. NVIDIA's Transformer Engine provides additional speed by using 16-bit floating-point precision and a new 8-bit floating-point data format that increases Tensor Core throughput by 2x and reduces memory requirements by 2x compared to 16-bit floating-point. These improvements, plus advanced Hopper software algorithms, allow models to be trained within days or hours instead of months, enabling earlier returns on investment and faster implementation of operational improvements.
Although the H100 is the latest GPU generation, NVIDIA's prior generation A100 GPU continues to produce record results and high performance. In edge computing, NVIDIA Jetson AGX Orin, built for edge AI and robotic applications, ran every MLPerf benchmark in edge computing, winning more tests than any other low-power system-on-a-chip. Orin performed up to 5x faster than its prior-generation Jetson AGX Xavier module while delivering an average of 2x better energy efficiency, and for MLPerf v2.1, demonstrated additional energy efficiency improvements of up to 50% compared to its earlier MLPerf results in April.
The H100's first MLPerf Inference v2.1 submission set new per-accelerator performance records on all workloads in the data center scenario and delivered up to 4.5x higher performance than the A100, with increased performance resulting from Hopper architectural breakthroughs and software optimizations that leveraged the new capabilities.