AMD develops next-generation UDNA GPU architecture using 2.5D/3.5D chiplet configurations.
AMD's current-generation Radeon RX 9000-series product line-up based on the RDNA 4 architecture does not attempt to challenge Nvidia in the high-end desktop GPU market, with its range-topping Radeon RX 9070 XT rivaling Nvidia's mid-range GeForce RTX 5070 Ti. However, according to the LinkedIn profile of Laks Pappu, senior fellow and chief system-on-chip (SoC) architect at AMD, the company's graphics division may have significant developments in the pipeline. Pappu appears to be in charge of AMD's data center GPU development as well as Radeon products' architecture for cloud gaming, Navi4x and Navi5x generations. He describes his job as "building next-generation competitive 2.5D/3.5D chiplet-based and monolithic graphics SoCs on various packaging technologies," which implies that AMD's next generation graphics processors will use monolithic and multi-chiplet arrangements.
Laks Pappu joined AMD in August 2022, after over 25 years at Intel, where he was in charge of Intel's discrete graphics processors codenamed DG1, Alchemist, and Battlemage. He also explored "multi-tile GPUs" for high-end graphics cards, though for now dual-GPU Battlemage products are aimed at AI workloads rather than graphics. Because high-end GPUs typically follow a 2.5 to 3.5-year development cycle—with architecture definition and block-level planning taking about a year, physical implementation another 1 to 1.5 years depending on design complexity and transistor count, and tape and silicon bring-up another year—Pappu likely significantly influenced the physical implementation of RDNA 4 and CDNA 4 products like the Radeon RX 9000-series and Instinct MI350-series. His involvement with the Navi 5x generation and probable Instinct MI500-series generation would represent his first architectures led from the ground up, giving him full-cycle influence.
Building multi-tile consumer GPUs is extremely challenging due to the tightly coupled nature of graphics processing workloads and the need for ultra-fast, low-latency communication between processing units. Unlike CPUs, which can tolerate some latency across cores or chiplets, GPUs rely on thousands of parallel threads that must coordinate precisely and quickly within warps or thread groups. Disaggregating shader cores across multiple dies introduces synchronization overhead, latency penalties, and complex coherency requirements that can significantly reduce performance or increase power consumption. Software and drivers must also present the multi-tile GPU as a single, unified device to operating systems and game engines, adding another layer of complexity. These architectural, manufacturing, and software hurdles have kept multi-tile designs mostly confined to data center and HPC GPUs.
However, as it becomes harder and more expensive to build large gaming GPUs like Nvidia's GB102, it may finally make sense to build consumer-oriented multi-tile graphics processors. AMD was the first company to use multi-chiplet designs for data center and consumer CPUs and will not be a surprise if it disaggregates graphics processing units in the future. In fact, AMD's Radeon RX 7900-series Navi 31 processors already feature a disaggregated design consisting of one main graphics core die (GCD) and six cache/memory controller/PHY chiplets. The symmetrical floor plan of Navi 31's GCD means the chip could potentially be "halved" if AMD figures out how to disaggregate the design on the logical level and make software treat it as a monolithic GPU—a capacity that already enabled AMD to create multiple product tiers from a single design, including the Radeon RX 7900 XTX, RX 7900 XT, RX 7900 GRE, and RX 7900M.