AMD announces Instinct MI325X AI accelerator claiming superior performance vs Nvidia H200, with MI350 planned for further uplift.
AMD said its forthcoming 256-GB Instinct MI325X GPU can outperform Nvidia's 141-GB H200 processor on AI inference workloads and vowed that the next-generation MI350 accelerator chips will improve performance by magnitudes. The Santa Clara, Calif.-based chip designer was expected to make these claims at its Advancing AI event in San Francisco, where the company will discuss its plan to take on AI computing giant Nvidia with Instinct chips, EPYC CPUs, networking chips, an open software stack and data center design expertise. According to Forrest Norrod, head of AMD's Data Center Solutions business group, "AMD continues to deliver on our roadmap, offering customers the performance they need and the choice they want, to bring AI infrastructure, at scale, to market faster." When it comes to training AI models, AMD said the MI325X is on par or slightly better than the H200, the successor to Nvidia's popular and powerful H100 GPU.
The MI325X is a follow-up to the Instinct MI300X, which launched last December and put AMD on the map as a worthy competitor to Nvidia's prowess in delivering powerful AI accelerator chips. Whereas the Instinct MI300X features 192GB of HBM3 high-bandwidth memory and 5.3 TB/s in memory bandwidth, the MI325X—which is based on the same CDNA 3 GPU architecture as the MI300X—comes with 256GB in HBM3E memory and can reach 6 TB/s in memory bandwidth thanks to the update in memory format. In terms of throughput, the MI325X has the same capabilities as the MI300X: 2.6 petaflops for 8-bit floating point (FP8) performance and 1.3 petaflops for 16-bit floating point (FP16). The MI325X is set to arrive in systems from Dell Technologies, Lenovo, Supermicro, Hewlett Packard Enterprise, Gigabyte, Eviden and several other server vendors starting in the first quarter of next year. This release is part of AMD's new strategy to release Instinct chips every year instead of every two years, explicitly done to keep up with Nvidia's accelerated chip release cadence.
When comparing AI inference performance to the H200 at a chip level, AMD said the MI325X provides 40 percent faster throughput with an 8-group, 7-billion-parameter Mixtral model; 30 percent lower latency with a 7-billion-parameter Mixtral model and 20 percent lower latency with a 70-billion-parameter Llama 3.1 model. The eight-chip Instinct MI325X platform features 2TB of HBM3e memory, 48 TB/s of memory bandwidth, 20.8 petaflops of FP8 performance and 10.4 petaflops of FP16 performance, with eight MI325X GPUs connected over AMD's Infinity Fabric with a bandwidth of 896 GB/s. According to AMD, the MI325X platform has 80 percent higher memory capacity, 30 percent greater memory bandwidth and 30 percent faster FP8 and FP16 throughput than Nvidia's H200 HGX platform, which comes with eight H200 GPUs and started shipping earlier this year. Comparing inference performance to the H200 HGX platform, AMD said the MI325X platform provides 40 percent faster throughput with a 405-billion-parameter Llama 3.1 model and 20 percent lower latency with a 70-billion-parameter Llama 3.1 model. When training a 7-billion-parameter Llama 2 model on a single GPU, AMD said the MI325X is 10 percent faster than the H200, while the MI325X platform is on par with the H200 HGX platform when training a 70-billion-parameter Llama 2 model across eight GPUs.
AMD said its next-generation Instinct MI350 accelerator chip series is on track to launch in the second half of next year and teased that it will provide up to a 35-fold improvement in inference performance compared to the MI300X, based on engineering estimates for an eight-GPU MI350 platform running a 1.8-trillion-parameter Mixture of Experts model. Based on AMD's next-generation CDNA 4 architecture and using a 3-nanometer manufacturing process, the MI350 series will include the MI355X GPU, which will feature 288GB of HBM3e memory and 8 TB/s of memory bandwidth. With the MI350 series supporting new 4-bit and 6-bit floating point formats (FP4, FP6), the MI355X is capable of achieving 9.2 petaflops, with FP8 and FP16 expected to reach 4.6 petaflops and 2.3 petaflops, respectively. Featuring eight MI355X GPUs, the Instinct MI355X platform is expected to feature 2.3TB of HBM3e memory, 64 TB/s of memory bandwidth, 18.5 petaflops of FP16 performance, 37 petaflops of FP8 performance and 74 petaflops of FP6 and FP4 performance. With this 74 petaflops of FP6 and FP4 performance, the MI355X platform is expected to be 7.4 times faster than the FP16 capabilities of the MI300X platform, and its 50 percent greater memory capacity means it can support up to 4.2-trillion-parameter models on a single system, six times greater than what was capable with the MI300X platform. After AMD debuts the MI355X in the second half of next year, the company plans to introduce the Instinct MI400 series in 2026 with a next-generation CDNA architecture.