NVIDIA previews Rubin CPX GPU architecture optimized for inference and long-context AI workloads.
Nvidia is repositioning itself for a significant shift in the AI data center market, moving from its traditional dominance in training workloads to the more diverse landscape of inference. On Tuesday, the Santa Clara, California-based GPU manufacturer unveiled Rubin CPX, a new class of GPU designed specifically to handle massive-context processing that will enable AI systems to process million-token software coding and generative video applications. The new units promise energy efficiency and high performance for inference tasks, with $5 billion in token revenue per $100 million invested, and will operate within Nvidia's Vera Rubin NVL 144 CPX platform.
According to market analysis, this strategic shift addresses a rapidly expanding opportunity. The global AI inference market was estimated at $106 billion in 2025 and is projected to grow to $255 billion by 2030, according to a Markets and Markets report. Nvidia's new inferencing data center platform, powered by Blackwell Ultra and the upcoming Vera Rubin GPUs, is designed to tackle the most demanding workloads, particularly those utilizing the Mixture of Experts (MoE) LLM architecture that drives so-called "AI factories."
Matt Kimball, vice president and principal analyst for Moor Insights & Strategy, endorsed Nvidia's inference-focused strategy, telling Data Center Knowledge that "Rubin is a beast of a part… just as Blackwell was a beast compared to Hopper. You're talking about opening up faster and bigger inferencing, [and] opening up those token windows." However, Kimball noted that Rubin CPX is not aimed at average enterprise players, stating: "This is taking Rubin and creating a specialized inference part that is really geared toward the high end," with hyperscalers and large enterprises likely forming the bulk of customers.
Shar Narasimhan, Nvidia's director of marketing for AI and data center GPUs, highlighted Rubin CPX's market positioning, saying "[Rubin CPX] unlocks a new tier of premium use cases like intelligent coding systems and video generation" and adding that "It will dramatically increase the productivity and performance of AI factories."
On Tuesday, Nvidia also demonstrated the capabilities of its Blackwell Ultra architecture through benchmark results for the GB300 NVL72 rack-scale system, which showed 1.4 times more DeepSeek-R1 inference than its predecessor and set records on all new data center benchmarks added to the MLPerf Inference v5.1 suite, including those for Llama 3.1 405B Interactive, Llama 3.1 8B and Whisper. Dave Salvatore, Nvidia's director of accelerated computing products, stated: "I'm very pleased with these numbers. And we expect these numbers to increase over time as we continue to optimize the Blackwell Ultra software stack." According to Nvidia, these benchmark results underscore Blackwell Ultra's potential to increase productivity for AI factories while boosting revenue and driving down the cost of ownership.