NVIDIA and Microsoft announced partnership to jointly build large-scale AI supercomputer for enterprise AI and cloud services.
Nvidia Corp. and Microsoft Corp. announced a partnership to build a supercomputer optimized for running artificial intelligence software that will be implemented in Microsoft's Azure public cloud platform. "AI technology advances as well as industry adoption are accelerating," said Manuvir Das, the vice president of enterprise computing at Nvidia. "The breakthrough of foundation models has triggered a tidal wave of research, fostered new startups and enabled new enterprise applications. Our collaboration with Microsoft will provide researchers and companies with state-of-the-art AI infrastructure and software to capitalize on the transformative power of AI."
Microsoft will equip Azure with tens of thousands of Nvidia graphics processing units, which are widely used in supercomputers for their ability to speed up AI and scientific applications. The systems will pair the GPUs with several other technologies from Nvidia, including the Quantum-2 series of network switches. While Azure already includes multiple compute instances powered by Nvidia's A100 chip, the chipmaker's flagship data center graphics card when it launched in 2020, the partnership will introduce Azure cloud instances powered by the H100, Nvidia's current flagship data center GPU that made its debut in March. The H100 features 80 billion transistors that can train AI models up to six times faster than the previous-generation A100 graphics card, and includes optimizations for running Transformer models, a type of advanced neural network widely used for natural language processing. The Quantum-2 InfiniBand switch series can process 400 gigabits of traffic per second per network port, twice as much as Nvidia's previous-generation hardware.
Software optimization is a major focus of the collaboration. Microsoft and Nvidia will optimize Microsoft's open-source DeepSpeed toolkit, which developers use to reduce neural networks' infrastructure requirements, to run on the H100 graphics card. The optimization effort will concentrate on helping developers speed up AI models that use the Transformer neural network architecture through the Transformer Engine feature built into the H100, which accelerates neural networks by reducing the amount of data they must process to complete calculations. Nvidia AI Enterprise, the chipmaker's software platform that helps companies run AI applications on its chips and includes preconfigured neural networks optimized for tasks such as generating shopping recommendations, will also be certified to run on Azure's new H100-powered instances.
"Our collaboration with NVIDIA unlocks the world's most scalable supercomputer platform, which delivers state-of-the-art AI capabilities for every enterprise on Microsoft Azure," said Scott Guthrie, executive vice president of Microsoft's Cloud + AI Group. Nvidia will expand its use of Azure as part of the partnership, using Azure instances to support its research efforts in generative AI. The chipmaker is making significant investments in this area, having debuted MT-NLG in October, a generative AI system featuring 530 billion parameters that was described at the time as the most powerful in its category.