Intel details Ponte Vecchio GPU architecture with optimizations for HPC and AI workloads.
Intel offered the closest glimpse yet at its flagship datacenter GPU, code named Ponte Vecchio, at the Hot Chips conference, with its own internal benchmarks showing the chip outperforming AMD's MI250x and competing head-to-head with Nvidia's upcoming H100 GPU. Announced last year, Ponte Vecchio is Intel's first serious run at delivering a high-performance GPU for AI/ML and HPC applications. The chip is actually a series of memory and compute dies glued together using a combination of Intel's Foveros and EMIB packaging tech into a "stack," with two such stacks per accelerator. According to Intel Fellow Hong Jiang, these stacks can behave like a pair of GPU dies or as a single logical die depending on application needs. Intel claims Ponte Vecchio will deliver 52 teraflops of performance based on a design choice that allows the same peak FP32 and FP64 performance, putting it just ahead of AMD's 47.9Tflop (FP64) MI250X and within spitting distance of the H100's 60Tflops (FP64). Intel also discussed the performance of its XMX matrix accelerators, which are analogous to Nvidia's tensor cores, claiming the GPU will deliver 419Tflops of performance in single-precision matrix calculations.
Some of the performance can be attributed to Ponte Vecchio's large caches, which include a 64MB register file, 64MB of L1 cache, 408MB of L2 cache, and 128GB of HBM memory. "This really helps us to keep data on die instead of having to go out to the HBM memory," Jiang said. Both Intel and Nvidia's GPUs are reliant on PCIe 5.0 for connectivity to the host, which means comparing them to AMD's PCIe 4.0-based MI200-series GPUs isn't exactly an apples-to-apples comparison. The new PCIe spec offers twice the bandwidth to the host but requires a next-gen CPU from either Intel or AMD, neither of which are available yet. While Nvidia could easily opt for AMD's Epyc 4 chips when they launch this fall or use its own Grace CPUs, Intel appears to be sticking with an all Intel architecture, having shown off four liquid cooled Ponte Vecchio GPUs paired with two Sapphire Rapids Xeon Scalable processors in a 1U chassis at HotChips, though up to eight GPUs can be connected to a single node using the company's Xe Link fabric.
Intel has pushed back Ponte Vecchio's arrival to Q1 2023, more than a year and a half after it was supposed to launch. Meanwhile, AMD is slated to release its Instinct MI300 accelerators in 2023, billing them as the "first datacenter APUs," with a Zen 4 processor co-packaged alongside a CDNA 3-based GPU and claiming an eight-fold performance improvement over the MI250X. Nvidia revealed its Grace-Hopper Superchip at GTC this spring, which pairs its Arm-based Grace CPU with a GH100 GPU, 512MBs of LPDDR5X and 80GBs of HBM3 memory on a single 1000W package. Not to be left out, Intel announced similar plans for its Falcon Shores XPU in May, which will merge its HBM-equipped Sapphire Rapids CPU and Ponte Vecchio GPU stack into a single package, claiming a five-fold improvement in performance-per-watt, memory capacity, and bandwidth compared with "current platforms."
Ponte Vecchio not only faces competition from Nvidia and AMD, but if held back much longer the chip could find its lifespan cut short by its successor, code named Rialto Bridge. Rialto Bridge, which is supposed to begin sampling next year, will see Intel up the power consumption to 800W per module and will require liquid cooling. At least one customer is eagerly awaiting Ponte Vecchio's arrival: The Department of Energy's Argonne National Laboratory, which plans to use the chips in its Aurora supercomputer, pending the epic Intel-driven delays that have plagued the project.