NVIDIA acquired Groq to serve specialized low-latency decode acceleration role, similar to Mellanox architecture integration model.
NVIDIA's $20 billion non-licensing agreement with Groq, announced on Christmas Eve, represents the company's largest investment as it seeks to expand its influence beyond GPU training. During the Q4 2026 earnings call, NVIDIA CEO Jensen Huang was asked about the company's plans for Groq's LPU (low latency decoder) units and responded with a strategic vision he promised to detail at GTC. "With respect to how we think about Groq and the low latency decoder, I've got some great ideas that I'd like to share with you at GTC," he said, adding that NVIDIA would "extend our architecture with Groq as an accelerator in very much the way that we extended NVIDIA's architecture with Mellanox."
The acquisition targets a critical gap in NVIDIA's product portfolio. While the company has dominated the training space with Hopper and Blackwell processors, inference represents an area where NVIDIA has yet to solidify its lead—particularly in latency-sensitive workloads where agentic AI applications require ultra-fast responses. Groq's LPU units leverage on-die SRAM to provide tens of terabytes per second of internal bandwidth, an approach already adopted by competitors like Cerebras with the WSE-3 and Microsoft with Maia 300. For agentic AI workloads, decode performance is critical, allowing agents to perform complex reasoning steps in mere seconds as the industry moves toward swarms of interdependent AI agents.
NVIDIA's approach parallels its Mellanox acquisition, which solved networking challenges and established "extreme co-design" principles that strengthened the company's datacenter strategy. According to GF Securities, NVIDIA could unveil an "LPX rack" at this year's GTC featuring 256 LPU units in a single unit. Two main integration strategies are being explored: hybrid compute nodes within rack-scale offerings with multiple LPUs connected via unified interconnect, or on-die LPU units integrated within Feynman GPUs through hybrid bonding. If the rack-scale approach prevails, LPU-to-LPU connections would likely use native plesiosynchronous chip-to-chip protocols, while LPU-to-GPU communication could employ NVLink Fusion to handle massive KV cache offload from GPUs during the prefill phase.
NVIDIA's hybrid architecture combining LPUs for decoding with existing GPU capabilities for prefill stages—leveraging Vera Rubin's attention-acceleration engines and NVFP4 compute—positions the company to lead in inference performance. Jensen Huang recently disclosed that compute and revenue growth at NVIDIA are now expanding in lockstep at a 1:1 ratio, driven by aggressive evolution at the application layer of AI. With formal announcements expected at this year's GTC, NVIDIA's integration of Groq's technology promises to give the company a decisive advantage in addressing the latency challenges that have emerged as a fundamental bottleneck for compute providers.