Cerebras partners with Qualcomm to integrate Qualcomm IP and expand wafer-scale AI processor distribution.
Cerebras Systems has launched the CS3, its third-generation commercial wafer-scale AI processor, built on the TSMC 5nm process and immediately available. The new system delivers twice the performance of its predecessor, the CS2, while maintaining the same footprint, power consumption, and cost. The CS3 features 900,000 AI cores, 44 GB of fast on-wafer memory (approximately 10X faster than HBM), and four trillion transistors on the WSE-3 that interconnect across the wafer to dramatically accelerate processing times for generative AI. The Cerebras software stack enables AI problems to scale efficiently across CS3 clusters with a fraction of the development effort required to distribute problems across clusters of accelerators, helping the company earn support from organizations like the Mayo Clinic and Glaxo-Smith Klein.
The CS3's architecture addresses the memory challenges that GPU-based clusters face. Rather than relying on expensive High Bandwidth Memory modules and 3D chip stacking, the system complements faster on-wafer SRAM with a separate memory server called MemoryX, which serves parameters from a 2.4 Petabyte appliance. According to Cerebras, a single CS3 system can train larger AI models than a 10,000 GPU or Google TPU cluster—all within a single rack. This eliminates the need for the massive infrastructure traditionally required for large-scale AI training.
Cerebras is collaborating with UAE-based G42 to build a distributed network of nine data centers equipped with Cerebras technology. The next installation in this Galaxy constellation, Condor Galaxy 3, is currently being built in Dallas using the new CS3 servers. The two companies are on track to complete all nine Galaxy supercomputers by the end of this year, creating a massive AI system for internal G42 use and to provide cloud services.
To address inference challenges, Cerebras has partnered with Qualcomm, which has developed the Cloud AI100, an inference appliance that leads competitors in MLPerf benchmark tests for energy efficiency and has garnered support from partners including AWS and HPE. The collaboration applies three optimization techniques developed by Qualcomm AI Research—sparsity, speculative decoding, and MX6 compression—to Cerebras' CS3 training stack. By using these inference-target-aware optimizations, the two companies claim to cut the cost of inference by 10X. As noted in the collaboration, "We have always thought that the industry needs to cut inference costs by two orders of magnitude by the end of this decade. Now, 10X has already been achieved, so I think we underestimated the pace of innovation."
By partnering with Qualcomm, Cerebras can now provide an end-to-end high-performance AI platform spanning both training and inference processing. This alliance allows the companies to address customers' entire AI workflow with an optimized solution, eliminating the need for Cerebras to partner with Nvidia to achieve similar results. As the AI industry moves from research to practical application and from cloud to edge deployment, this comprehensive approach positions both companies to serve the evolving needs of the market.