Wafer-scale accelerators could redefine AI

The promise of a new type of computer chip that could reshape the future of artificial intelligence and be more environmentally friendly is explored in a technology review paper published by UC Riverside engineers in the journal Device.

Known as wafer-scale accelerators, these massive chips made by Cerebras are built on dinner plate-sized silicon wafers, in stark contrast to traditional graphics processing units, or GPUs, which are no bigger than a postage stamp.

The peer-reviewed paper by a cross-disciplinary UCR team concludes that wafer-scale processors can deliver far more computing power with much greater energy efficiency—traits that are needed as AI models grow ever larger and more demanding.

“Wafer-scale technology represents a major leap forward,” said Mihri Ozkan, a professor of electrical and computer engineering in UCR’s Bourns College of Engineering and the paper’s lead author. “It enables AI models with trillions of parameters to run faster and more efficiently than traditional systems.”

GPUs became essential tools for AI development because they can perform many computations at once—ideal for processing images, language, and data streams in parallel. The execution of thousands of parallel operations simultaneously allows for driverless cars to interpret the world around them to avoid collisions, for images to be generated from text, and for ChatGPT to suggest dozens of meal recipes from a specific list of ingredients.

But as AI model complexity increases, even high-end GPUs are starting to hit performance and energy limits.

“AI computing isn’t just about speed anymore,” Ozkan said. “It’s about designing systems that can move massive amounts of data without overheating or consuming excessive electricity.”

The UCR analysis compares today’s standard GPU chips with wafer-scale systems like the Cerebras Wafer-Scale Engine 3 (WSE-3), which contains 4 trillion transistors and 900,000 AI-specific cores on a single wafer. Tesla’s Dojo D1, another example, includes 1.25 trillion transistors and nearly 9,000 cores per module. These systems are engineered to eliminate the performance bottlenecks that occur when data must travel between multiple smaller chips.

“By keeping everything on one wafer, you avoid the delays and power losses from chip-to-chip communication,” Ozkan said.

The paper also highlights technologies such as chip-on-wafer-on-substrate packaging, which could make wafer-scale designs more compact and easier to scale, with a potential 40-fold increase in computational density.

While these systems offer substantial advantages, they’re not suited for every application. Wafer-scale processors are costly to manufacture and less flexible for smaller-scale tasks. Conventional GPUs, with their modularity and affordability, remain essential in many settings.

“Single-chip GPUs won’t disappear,” Ozkan said. “But wafer-scale accelerators are becoming indispensable for training the most advanced AI models.”

The paper also addresses a growing concern in AI: sustainability. GPU-powered data centers use enormous amounts of electricity and water to stay cool. Wafer-scale processors, by reducing internal data traffic, consume far less energy per task.

For example, the Cerebras WSE-3 can perform up to 125 quadrillion operations per second while using a fraction of the power required by comparable GPU systems. Its architecture keeps data local, lowering energy draw and thermal output.

Meanwhile, NVIDIA’s H100 GPU—the backbone of many modern data centers—offers flexibility and high throughput, but at greater energy cost. With an efficiency rate of about 7.9 trillion operations per second per watt, it also requires extensive cooling infrastructure, often involving large volumes of water.

“Think of GPUs as busy highways—effective, but traffic jams waste energy,” Ozkan said. “Wafer-scale engines are more like monorails: direct, efficient, and less polluting.”

Cerebras reports that inference workloads on its WSE-3 system use one-sixth the power of equivalent GPU-based cloud setups. The technology is already being used in climate simulations, sustainable engineering, and carbon-capture modeling.

“We’re seeing wafer-scale systems accelerate sustainability research itself,” Ozkan said. “That’s a win for computing and a win for the planet.”

However, heat remains a challenge. With thermal design power reaching 10,000 watts, wafer-scale chips require advanced cooling. Cerebras employs a glycol-based loop built into the chip package, while Tesla uses a coolant system that distributes liquid evenly across the chip surface.

The authors also emphasize that up to 86% of a system’s total carbon footprint can come from manufacturing and supply chains, not just energy use. They advocate for recyclable materials and lower-emission alloys, along with full lifecycle design practices.

“Efficiency starts at the factory,” Ozkan said. “To truly lower computing’s impact, we need to rethink the whole process—from wafer to waste.”

The article, “Performance, Efficiency, and Cost Analysis of Wafer-Scale AI Accelerators vs. Single-Chip GPUs,” is freely available in the Cell Press journal Device.

In addition to Ozkan, co-authors include UCR graduate students Lily Pompa, Md Shaihan Bin Iqbal, Yiu Chan, Daniel Morales, Zixun Chen, Handing Wang, Lusha Gao, and Sandra Hernandez Gonzalez.

“This review is the result of a deep interdisciplinary collaboration,” Ozkan said. “We hope it serves as a roadmap for researchers, engineers, and policymakers navigating the future of AI hardware.”