Nvidia introduces NVLink Switch to enable higher-density GPU interconnects for scaled AI computing systems.
Data centers are the heart of the AI era, but meeting exponential performance increases requires a holistic design approach to overcome power and thermal limitations. Innovation spans compute architectures, memory, power sources, power distribution, and cooling solutions. However, networking will have the most significant impact on both performance and latency while potentially changing the nature of processing compute workloads. One of the most important innovations introduced this year is Nvidia's NVLink Switch for the GB200 NLV72 exascale rack computer system.
NVLink Switch is a crossbar network switch architecture that allows all ports to communicate directly with any other port over NVLink, a high-speed efficient compute interconnect. The initial NVLink Switch supported 50 gigabytes per second (GB/s) bi-directional non-blocking communication links in the DGX-2 platform. Nvidia has continued enhancing both NVLink and NVLink Switch technologies. For the current Blackwell generation of GPUs and the GB200 NVL72 system, the 5th generation NVLink provides 100GB/s per link. A Blackwell GPU with 18 ports translates to 1.8 terabytes per second (TB/s) of bandwidth per GPU. The GB200 NVL72 system rack has 18 NVLink Switches connecting 36 Nvidia Grace CPUs and 72 Blackwell GPUs for a total system non-blocking communications bandwidth of 130 TB/s. The ability to use NVLink Switches to connect across nodes allows for scaling up to 576 GPUs.
These enhancements combined with extensive system design enable one of the densest server configurations, translating to higher overall performance and efficiency. While this will not allow existing data centers to replace all racks due to higher power and complex infrastructure requirements, particularly liquid cooling, existing data centers can do more AI and HPC workloads in a fraction of the space. New AI and HPC data centers can be designed with this space efficiency in mind for a smaller footprint or to plan for the unique infrastructure requirements of a full-scale data center.
The true value lies in meeting the continued increasing demands of AI and HPC workloads. According to Nvidia, the GB200 NVL72 can support 27 trillion parameter model sizes, exceeding the sizes of current largest large language models (LLMs) for generative AI (GenAI), such as GPT-4 and 4o. While there is a push to use these large models as foundation models to develop smaller, more optimized models, the largest models will continue to grow for applications like scientific analysis and the drive to artificial general intelligence (AGI). The resources of the GB200 NVL72 can also be parsed to support multiple workloads, providing greater efficiency for both AI training and inference processing. The NVLink Switch represents an essential innovation to allow for both scaling AI workloads and improving the efficiency of data centers to make AI more cost effective.