Chips & Hardware · Report

NVIDIA Rubin CPX specifications: up to 128GB GDDR7 memory, 30 PFLOPs FP4, optimized for million-token context and generative AI inference.

Memory and compute scaling roadmap signals infrastructure demand for long-context inference and parameter-efficient generative models.

Trade pressSlicast · September 9, 2025 · Global · Source: wccftech.com

importance 70

NVIDIA has announced new details about its next-generation Rubin AI platform, unveiling the Rubin CPX, a GPU purpose-built for massive-context processing. The Rubin CPX enables AI systems to handle million-token software coding and generative video with groundbreaking speed and efficiency. The chip integrates video decoder and encoders alongside long-context inference processing in a single design, offering unprecedented capabilities for long-format applications such as video search and high-quality generative video.

The Rubin CPX will work alongside NVIDIA Vera CPUs and Rubin GPUs within the new Vera Rubin NVL144 CPX platform, an integrated MGX system packing 8 exaflops of AI compute. This represents 7.5x more AI performance than NVIDIA's GB300 NVL72 systems, along with 100TB of fast memory and 1.7 petabytes per second of memory bandwidth in a single rack. The platform also offers 3x higher Attention performance than the GB300 NVL72. Individual Rubin CPX compute trays will also be available for customers looking to reuse existing Vera Rubin 144 systems.

The Rubin CPX GPU delivers 30 PFLOPs of NVFP4 AI compute power and packs up to 128 GB of GDDR7 memory. NVIDIA chose GDDR7 over HBM for cost efficiency. The chip features 4x the NVENC and NVDEC capabilities compared to previous generations, providing expanded video processing capabilities to support generative AI tasks. Built on a cost-efficient, monolithic die design with powerful NVFP4 computing resources, the Rubin CPX is optimized for extremely high performance and energy efficiency in AI inference tasks.

In comparison, the non-CPX Vera Rubin NVL144 platform features four Rubin GPUs and two Vera CPUs, offering 3.6 Exaflops of NVFP4 compute, 1.4 PB/s of HBM4 bandwidth, and 75 TB of capacity. The Vera Rubin CPX platform offers 7.5x higher AI compute, 3.0x higher bandwidth, and 4.0x higher memory (150 TB in GDDR7) compared to the Grace Blackwell platform. While the standard Rubin platform features 2-reticle-sized GPUs and Rubin Ultra features 4-reticle-sized GPUs, the CPX chip employs a singular die and monolithic configuration. NVIDIA expects the first Rubin CPX systems to be available by the end of 2026, while Vera Rubin is expected to enter production soon with a proper unveiling planned by GTC 2026.

Read the original