NVIDIA invests 20 billion dollars in Groq, following Google's playbook of acquiring specialized AI chip design and IP capabilities.
Nvidia licensed Groq's inference technology for $20 billion last December and unveiled the resulting Groq 3 language processing unit at GTC 2026 in San Jose. The deal, Nvidia's largest ever, brought Groq founder Jonathan Ross and senior leaders to Nvidia along with a perpetual license to Groq's patent portfolio and software stack. Groq continues to operate independently under new CEO Simon Edwards, but the technology now sits at the heart of Nvidia's inference roadmap. The move represents the clearest admission yet from the world's dominant AI chipmaker that graphics processing units, however powerful, are not the right tool for every AI workload.
AI computing broadly splits into two phases: training, which builds a model by processing massive datasets over weeks or months, and inference, which runs the trained model to generate responses, images and decisions in real time. While Nvidia's GPUs have dominated training because of their raw parallel processing power, inference presents a fundamentally different computational profile. The bottleneck during inference is not floating-point arithmetic but memory bandwidth, the speed at which a processor can move model weights and intermediate data through the chip. Nvidia's forthcoming Rubin GPU offers 288 gigabytes of high-bandwidth memory with 22 terabytes per second of bandwidth, whereas the Groq 3 LPU contains just 500 megabytes of on-chip SRAM but delivers 150 terabytes per second of bandwidth, roughly seven times the Rubin figure. That lean, speed-focused design is what makes the LPU excel at generating tokens with predictable low latency.
Rather than positioning the LPU as a GPU replacement, Nvidia built a heterogeneous architecture where both processors handle distinct phases of the inference pipeline. The Vera Rubin NVL72 rack system pairs 72 Rubin GPUs with the new Groq 3 LPX rack housing 256 interconnected LPUs, with Nvidia's Dynamo software layer orchestrating workloads between the two processors in real time. This move arrives against a backdrop of intensifying competition: Google has run purpose-built tensor processing units for over a decade and recently introduced Ironwood, its seventh-generation TPU designed specifically for inference workloads, while also training Gemini 3 entirely on TPUs to demonstrate that custom silicon can match or exceed GPU-based training pipelines. Amazon has deployed hundreds of thousands of Trainium chips across its data centers for both training and inference. The pattern is unmistakable—the three largest cloud providers are all building heterogeneous compute environments where specialized silicon handles specialized tasks.
At GTC, Jensen Huang projected $1 trillion in orders for Blackwell and Vera Rubin systems through 2027 and argued that agentic AI will drive a fundamental shift in computing demand. Nvidia claims its combined GPU-plus-LPU architecture delivers up to 35 times higher inference throughput per megawatt and up to 10 times more revenue opportunity for trillion-parameter models compared to GPU-only deployments. However, these performance claims deserve scrutiny: the 35x throughput improvement and 10x revenue gain apply to specific decode-heavy workloads, not across all inference scenarios, and Nvidia's own deployment guidance suggests adding Groq LPUs to roughly 25% of a data center's total capacity. The LPU's 500-megabyte SRAM capacity means that large model weights must be distributed across all 256 chips in a rack, requiring tight inter-chip coordination, and whether that architecture scales economically beyond Nvidia's controlled benchmarks remains unproven.
Nvidia paid nearly three times Groq's September 2025 valuation of $6.9 billion, a steep premium for what is technically a non-exclusive licensing agreement rather than a full acquisition. The Groq 3 LPX rack, manufactured by Samsung, ships in the second half of 2026, while AMD is expected to respond with its own inference-focused announcements at Computex in June. For CXOs evaluating AI infrastructure, the Nvidia-Groq combination signals that procurement decisions are becoming more complex: the question is no longer simply how many GPUs to buy, but how to assemble a heterogeneous compute environment that matches the right processor to the right workload at the right cost.