Chips & Hardware · Report

Comparative technical analysis of Intel Gaudi AI processors versus Nvidia GPUs for datacenter deployment.

Validates Intel's return to competitive AI accelerator market; pressures Nvidia on price-performance and forces datacenter operators to evaluate alternatives.

Trade pressSlicast · June 13, 2024 · Global · Source: nextplatform.com

importance 68

At the Computex IT conference in Taipei, Taiwan, Intel revealed list pricing for its Gaudi 2 and Gaudi 3 accelerators—something neither Nvidia nor AMD have done—along with benchmark results demonstrating competitive performance. Intel is pursuing this transparency strategy to establish value positioning for its future AI accelerators while generating sales to fund the development and launch of "Falcon Shores" in late 2025 and "Falcon Shores 2" in 2026. The Gaudi 3 chip, which started shipping in April, represents the end of the Gaudi line that Intel acquired through its December 2019 acquisition of Habana Labs for $2 billion.

Falcon Shores represents a technical bridge between Intel's existing and future AI compute. As Intel revealed in June 2023, Falcon Shores will merge the massively parallel Ethernet fabric and matrix math units of the Gaudi line with the Xe GPU engines created for Ponte Vecchio, enabling simultaneous 64-bit floating point processing and matrix math processing—a capability Ponte Vecchio lacks, limiting its appeal for AI workloads. Falcon Shores will consume 1,500 watts, representing 25 percent more power consumption than Nvidia's top-end "Blackwell" B200 GPU, which is rated at 1,200 watts and delivers 20 petaflops of compute at FP4 precision. Intel must employ its Intel 18A manufacturing process, expected in production in 2025, to deliver sufficient performance gains, with Falcon Shores 2 targeting the smaller Intel 14A process expected in 2026.

Intel's benchmark data positions Gaudi 3 competitively against current Nvidia hardware. For training, Intel demonstrated performance on GPT-3 with 175 billion parameters and Llama 2 with 70 billion parameters, running GPT-3 tests on clusters with 8,192 accelerators—Intel Gaudi 3 with 128 GB of HBM versus Nvidia H100 with 80 GB of HBM—and Llama 2 tests on machines with 64 devices. For inference, Intel presented Gaudi 3 with 128 GB of HBM against both the H100 with 80 GB of HBM and the H200 with 141 GB of HBM across multiple model configurations.

Intel's commercial motivation is clear: the company stated in October a $2 billion pipeline for Gaudi accelerator sales and added in April an expectation of $500 million in Gaudi accelerator sales in 2024. This pales against AMD's expected $4 billion in GPU sales this year and Nvidia's projected $100 billion or more in datacenter compute revenue, but clearing the $2 billion pipeline is essential to funding Falcon Shores and Falcon Shores 2. With TSMC maintaining a relentless innovation drumbeat and Nvidia advancing its roadmap with Blackwell Ultra in 2025 and Rubin in 2026, Intel faces mounting pressure to demonstrate execution across both its foundry and chip design businesses.

Read the original