Nvidia announces Grace Hopper superchip design with 144 cores on 4nm TSMC process.
Nvidia has disclosed additional details about its Grace CPU Superchip ahead of its Hot Chips 34 presentation, confirming that the chips are manufactured on TSMC's specialized 4N process node. The Grace CPU is the company's first Arm-based CPU designed exclusively for the data center, configured as two chips on one motherboard totaling 144 cores, while the Grace Hopper Superchip combines a Hopper GPU and the Grace CPU on the same board. The 4N process is a variant of TSMC's 5nm node family, optimized specifically for Nvidia's GPUs and CPUs through Design-Technology Co-Optimization (DTCO) between Nvidia and TSMC to achieve custom power, performance, and area characteristics.
Nvidia has confirmed that Grace uses Arm's Neoverse cores supporting Arm v9 architecture and SVE2 extensions, with the Neoverse N2 platform identified as the likely candidate given its support for Arm v9, PCIe Gen 5.0, DDR5, HBM3, CCIX 2.0, and CXL 2.0. The next-generation Arm Poseidon cores are not expected to arrive until 2024, making them unlikely for Grace's planned early 2023 launch date. Nvidia has developed the Nvidia Scalable Coherency Fabric (SCF), a proprietary mesh interconnect similar to Arm's standard CMN-700 Coherent Mesh Network, providing 3.2 TB/s of bi-sectional bandwidth between the Grace chip's CPU cores, memory, and I/O units, as well as the NVLink-C2C interface connecting to other units on the motherboard.
The Grace CPU architecture features 72+ cores with 117MB of total L3 cache managed through eight SCF Cache partitions and eight CPU units connected via Cache Switch Nodes to the SCF mesh fabric. The chip supports up to 68 PCIe lanes with four PCIe 5.0 x16 connections, each delivering up to 128 GB/s of bidirectional throughput, along with 16 dual-channel LPDDR5X memory controllers providing 32 total channels.
The Grace CPU incorporates 16 dual-channel LPDDR5X memory controllers supporting up to 512 GB of total memory with up to 546 GB/s of throughput. Nvidia selected LPDDR5X over HBM2e for multiple factors including capacity and cost, noting that LPDDR5X provides 53% more bandwidth and 1/8th the power-per-GB compared to standard DDR5 memory.