Chips & Hardware · Report

Technical analysis of NVIDIA's Blackwell GPU architecture and RTX 50-series performance improvements.

Blackwell represents next-generation compute capability, setting product roadmap expectations for data center customers.

Trade pressSlicast · January 15, 2025 · Global · Source: tomshardware.com

importance 60

Nvidia presented the Blackwell GPU architecture for the upcoming RTX 50-series GPUs at its Editors' Day during CES 2025, providing extensive details on core functionality across seven sessions. The company outlined four primary goals for Blackwell: optimize for new neural workloads, reduce memory footprint, introduce new quality of service capabilities, and improve energy efficiency. While many upgrades focus on AI and neural rendering technologies, the overall architecture shows significant but incremental changes compared to the RTX 40-series Ada Lovelace generation, with the exception of the RTX 5090, which features a substantially larger GPU die of 744 mm² compared to 608 mm² on the 4090.

The performance improvements, while meaningful, are more modest than previous generational leaps. The RTX 5090 delivers "up to 4,000 AI TOPS" (trillions of operations per second), scaling down to 3,400 TOPS (3,352 precisely) on the flagship consumer card. When comparing like-for-like specifications, the RTX 5090 achieves 1,676 TFLOPS of FP8 compared to the RTX 4090's 1,321 TFLOPS FP8, representing a 27% increase. Similarly, the RTX 5090 delivers up to 104.8 TFLOPS of FP32 against the RTX 4090's 82.6 TFLOPS FP32, again a 27% improvement. This contrasts with the RTX 4090's transition from the RTX 3090, which delivered a 132% increase in GPU TFLOPS. The 5090 die is 22% larger with 21% more transistors, both manufactured on the same TSMC 4N process node.

Architecturally, Blackwell introduces several noteworthy enhancements. The 4th-Generation RT cores now feature twice the ray triangle intersect rates of Ada and support Mega Geometry for potential improvements in future Unreal Engine 5 games. Nvidia has made all shader cores in Blackwell fully FP32/INT32 compatible, a shift from Ada where only half the cores supported INT32 operations. The GPU shaders have been enhanced for Neural Shaders, allowing better intermixing of shader and tensor core operations, with Shader Execution Reordering (SER) now twice as fast on Blackwell as on Ada. Blackwell also marks the first consumer GPU series supporting DisplayPort 2.1 UHBR20 at 80 Gbps and PCIe 5.0, while video encoding and decoding have been enhanced to support 4:2:2 video streams.

Memory represents a significant upgrade path, with Blackwell transitioning from GDDR6 and GDDR6X on the Ada generation to full GDDR7—the first major memory shift since the RTX 20-series in 2018, which introduced GDDR6 clocked at 14 Gbps. Most Blackwell RTX 50-series GPUs run GDDR7 at 28 Gbps, twice as fast as the original GDDR6 chips but only 33% faster than the GDDR6X chips at 21 Gbps used in higher-spec RTX 40-series GPUs. The RTX 5080 receives a speed bump to 30 Gbps GDDR7, nearly double the 15.5 Gbps memory on the 2080 Super. Evidence suggests this GDDR7 transition may be universal across the lineup, with even the RTX 5070 laptop GPU featuring 8GB of GDDR7.

Read the original