Microsoft aims to develop and deploy its own AI chips to reduce dependency on AMD and Nvidia.
Microsoft is working to shift the majority of its AI workloads from GPUs supplied by Nvidia and AMD to its own homegrown accelerators. The software giant arrived relatively late to the custom silicon space—while Amazon and Google have been building custom CPUs and AI accelerators for years, Microsoft only revealed its Maia AI accelerators in late 2023. The driving force behind this transition is a focus on performance per dollar, a metric Microsoft CTO Kevin Scott emphasized during a fireside chat moderated by CNBC on Wednesday. Scott acknowledged that Nvidia has offered the best price-performance to date, but Microsoft is willing to explore alternatives to meet demand. When asked whether the longer-term goal is to have mainly Microsoft silicon in the data center, Scott responded, "Yeah, absolutely."
Scott highlighted the broader system-design considerations that motivate the transition beyond raw compute performance alone. "It's about the entire system design. It's the networks and cooling, and you want to be able to have the freedom to make decisions that you need to make in order to really optimize your compute for the workload," he explained. Microsoft's first in-house AI accelerator, the Maia 100, demonstrated this approach by freeing up GPU capacity when the company shifted OpenAI's GPT-3.5 to its own silicon in 2023. However, with specifications of 800 teraFLOPS of BF16 performance, 64GB of HBM2e, and 1.8TB/s of memory bandwidth, the chip fell well short of competing GPUs from Nvidia and AMD.
Microsoft is reportedly bringing a second-generation Maia accelerator to market next year that will offer more competitive compute, memory, and interconnect performance. However, while a shift in the mix of GPUs to AI ASICs in Microsoft data centers is likely, these accelerators are unlikely to replace Nvidia and AMD's chips entirely. Google and Amazon have deployed tens of thousands of their TPUs and Trainium accelerators respectively, securing high-profile customer wins such as Anthropic, yet these chips are more often used to accelerate each company's own in-house workloads. Large-scale Nvidia and AMD GPU deployments continue across these cloud platforms partly because customers continue to demand them. Beyond AI accelerators, Microsoft is also developing other custom chips, including its own CPU called Cobalt and a suite of platform security silicon designed to accelerate cryptography and safeguard key exchanges across its vast datacenter domains.