Nvidia roadmap disclosures confirm Moore Law has ended; future performance gains come from architectural improvements, not density scaling.
Nvidia faces multiple interconnected challenges in accelerating AI compute growth, as Jensen Huang revealed at GTC this month while announcing not only next-generation Blackwell Ultra processors but also detailed roadmaps for upcoming platforms, including a 600kW rack scale system packing 576 GPUs and a new GPU family arriving in 2028 named after Richard Feynman. The fundamental constraint is that advancements in process technology have slowed to a crawl in recent years, leaving Nvidia with increasingly difficult knobs to turn. To compensate, Nvidia's strategy is to scale up the amount of silicon in each compute node as far as possible, expanding from today's densest systems that mesh 72 GPUs into a single compute domain using its high-speed 1.8TB/s NVLink fabric, to 144 and eventually 576 GPUs per rack.
The power requirements of this scaling strategy are staggering. Blackwell accelerators required 500 watts more power than Hopper while needing twice the die count, despite offering only about 1.25x performance when normalized to FP16 — 1,250 dense teraFLOPS versus 989 for the GH100. By 2027, Nvidia expects racks to surge to 600kW with the debut of the Rubin Ultra NVL576, which will jump from two reticle-limited dies to four while maintaining roughly 20 percent efficiency improvements from TSMC 2nm process technology. Even with these gains, as Huang stated during his press Q&A, "the practical limit for a rack is however much power you can feed it," with current datacenter capacity at "250 megawatts" and potential future limits reaching "a gigawatt per rack."
Beyond compute scaling, Nvidia is also dramatically expanding memory architecture. Rubin Ultra will jump from 288GB per package on Rubin to 1TB, roughly half coming from faster, higher capacity memory modules and the other half from doubling the amount of silicon dedicated to memory from eight modules on Blackwell and Rubin to 16 on Rubin Ultra. This enables fitting around 2 trillion parameters at FP4 into a single package, or 500 billion per individual die. HBM4e is expected to effectively double memory bandwidth over HBM3e, with bandwidth jumping from around 4TB/s per Blackwell die to around 8TB/s on Rubin Ultra.
To offset the limited gains from process improvements, Nvidia is aggressively pursuing precision reduction, dropping from 16-bit to 8-bit precision to effectively double throughput while halving memory requirements. From Hopper to Blackwell, Nvidia dropped four bits and doubled silicon while claiming a 5x floating point gain. The company continues exploring this avenue, with research being conducted on super low precision quantization as low as 1.58 bits while maintaining accuracy, though below four-bit precision, LLM inference becomes problematic with rapidly climbing perplexity scores. With Blackwell Ultra, Nvidia prioritized compute-dense workloads by nerving the chip's double precision (FP64) tensor core performance in exchange for 50 percent more 4-bit FLOPS, signaling that specialized precision optimization is becoming central to GPU design.
The cumulative effect of these trends — larger die counts, higher power density, expanded memory hierarchies, and precision optimization — means Nvidia's compute platforms will continue to grow bigger, denser, hotter, and more power hungry. While cooling megawatts of ultra-dense compute is not new to vendors like Cray, Eviden, and Lenovo, the scale and frequency have changed dramatically, moving from a handful of boutique compute clusters annually to dozens of increasingly demanding configurations that pose significant challenges for datacenter operators managing power delivery, thermal dissipation, and infrastructure costs.