Nvidia launches Vera Rubin NVL72 AI supercomputer with 5x inference performance gains and 10x lower cost per token versus Blackwell, arriving 2H 2026.
At CES 2026, Nvidia CEO Jensen Huang officially launched Vera Rubin, the company's next-generation AI data center rack-scale architecture designed to keep Nvidia at the forefront of the expanding AI revolution as the technology reaches into robotics, autonomous vehicles, and the broader physical world. Vera Rubin is the result of what Nvidia calls "extreme co-design" across six types of chips: the Vera CPU, the Rubin GPU, the NVLink 6 switch, the ConnectX-9 SuperNIC, the BlueField-4 data processing unit, and the Spectrum-6 Ethernet switch. These building blocks combine to create the Vera Rubin NVL72 rack, which represents a significant leap in AI compute delivery.
The Rubin GPU delivers exceptional performance gains over Blackwell, promising 50 PFLOPS of inference performance with the NVFP4 data type—5x that of Blackwell GB200—and 35 PFLOPS of NVFP4 training performance, 3.5x that of Blackwell. To support this raw compute, each Rubin GPU package includes eight stacks of HBM4 memory delivering 288GB of capacity and 22 TB/s of bandwidth. The Vera CPU implements 88 custom Olympus Arm cores with "spatial multi-threading" for up to 176 threads in flight, with the NVLink C2C interconnect doubling in bandwidth to 1.8 TB/s. Each Vera CPU can address up to 1.5 TB of SOCAMM LPDDR5X memory with up to 1.2 TB/s of memory bandwidth.
To handle the communication demands of mixture-of-experts architectures and scale across multiple racks, Vera Rubin introduces NVLink 6 for scale-up networking, boosting per-GPU fabric bandwidth to 3.6 TB/s (bi-directional). Each NVLink 6 switch boasts 28 TB/s of bandwidth, and each Vera Rubin NVL72 rack includes nine of these switches for 260 TB/s of total scale-up bandwidth. For scaling out to DGX SuperPods of eight racks each, Nvidia is introducing Spectrum-X Ethernet switches with co-packaged optics built from its Spectrum-6 chip. The SN688 boasts 409.6 Tb/s of bandwidth for 512 ports of 800G Ethernet or 2048 ports of 200G, while the SN6810 offers 102.4 Tb/s of bandwidth channeled into 128 ports of 800G or 512 ports of 200G Ethernet. Both switches are liquid-cooled and offer improved power efficiency, reliability, and uptime.
As context windows grow to millions of tokens, Nvidia is introducing the Inference Context Memory Storage Platform using next-generation BlueField 4 DPUs to create a new tier of memory that enables efficient sharing and reuse of key-value cache data across AI infrastructure, resulting in better responsiveness and throughput. For the first time, Vera Rubin expands Nvidia's trusted execution environment to the entire rack by securing the chip, fabric, and network level, ensuring secrecy and security for AI frontier labs' state-of-the-art models. Each Vera Rubin NVL72 rack delivers 3.6 exaFLOPS of NVFP4 inference performance, 2.5 exaFLOPS of NVFP4 training performance, 54 TB of LPDDR5X memory connected to the Vera CPUs, and 20.7 TB of HBM4 offering 1.6 PB/s of bandwidth.
Nvidia has implemented several reliability, availability, and serviceability improvements including a cable-free modular tray design for quicker component swapping, improved NVLink resiliency that allows for zero-downtime maintenance, and a second-generation RAS engine enabling zero-downtime health checks. With pricing as much as $8.8 million per NVL72 rack, Nvidia claims the system requires only 1/4 the number of GPUs to train mixture-of-experts models versus Blackwell, and that Rubin can cut the cost per token for MoE inference by as much as 10x across a broad range of models.