AMD executives revealed plans for ~1.2M GPU training cluster using future MI500 chips, potentially for Microsoft's $100B Stargate AI supercomputer.
Microsoft and OpenAI are collaborating on a groundbreaking data center project named 'Stargate,' set to launch by 2028. Financed by Microsoft to the tune of over $100 billion, the project aims to reduce the companies' reliance on Nvidia—something many tech giants involved in AI are looking to do these days.
According to reporting by The Next Platform in April, the AI supercomputer would likely be "based on future generations of Cobalt Arm server processors and Maia XPUs, with Ethernet scaling to hundreds of thousands to 1 million XPUs in a single machine." While the specific details remain unclear, and it remains uncertain whether Stargate will ever materialize, the project represents an ambitious undertaking in the field.
Forrest Norrod, AMD's EVP and general manager of the Datacenter Solutions Group, provided insight into the scale of such endeavors during a conversation with The Next Platform. When Timothy Prickett Morgan asked Norrod, "What's the biggest AI training cluster that somebody is serious about – you don't have to name names. Has somebody come to you and said with MI500, I need 1.2 million GPUs or whatever," Norrod replied, "It's in that range? Yes." Pressed for more details, he emphasized, "I am dead serious, it is in that range," and added crucial context: "I'm talking about one machine… The scale of what's being contemplated is mind blowing. Now, will all of that come to pass? I don't know. But there are public reports of very sober people contemplating spending tens of billions of dollars or even a hundred billion dollars on training clusters."
Given that few companies could afford such a "mind-blowing" project with a million plus GPUs, it is reasonable to connect AMD's conversations on the matter to Stargate. For any company seeking to sidestep Nvidia's dominance, turning to AMD would certainly make sense.