The largest chips in the history
A few days ago, the start-up company Cerebras introduced the largest chip ever at the IEEE Hot Chips seminar at Stanford University. According to them, this system, roughly about the size of a silicon wafer, is designed to reduce the AI training time from months to minutes.
This is the first commercial attempt at wafer-level processors since the failure of Trilogy Systems in the 1980s.
Next, let us talk about this chip.
data
As the largest chip ever, Cerebras' Wafer Scale Engine (WSE) naturally comes with a bunch of titles start with “the most”. Here is part of them
Size: 46,225 square millimeters. This is only about 75% of a letter paper, but 56 times the size of the largest GPU.
Transistor: 1.2 trillion. Nvidia's GV100 Volta is only 2.1 billion.
Processor core: 400,000. The GV100 has only 5,660.
Memory: 18 gigabytes of on-chip SRAM, approximately 3,000 times larger than that of the GV100.
Memory bandwidth: 9 PB per second. According to Cerebras, this is 10,000 times more than our favorite GPU.
Why do we need this monster?
Cerebras presented a very good case in its white book and explain why such a large chip makes sense.
Basically, the company believes that the demand of train deep learning systems and other artificial intelligence systems have gone out of control. The company said that these training will emerge a new model - creating a system that can recognize people or win the Go games once been trained. But in the past, it may take weeks or months and cost thousands and millions of dollars in computing time. This cost means that the space for experimentation is small, which will kill new ideas and innovations.
The answer for this question given by this company is that the world needs more and cheaper computing training resources. And the training should only take a few minutes instead of months. To do this, you need more cores, more memory that is close to these cores, and low latency, high bandwidth connections between the cores.
These goals will have an impact on everyone in the AI industry. But Cerebras also admits that this idea has been pushed to its logical extremes. A large chip can provide more silicon area for the processor core and the memory that depends on it. High-bandwidth, low-latency connections can only be achieved when data never have to leave the short, dense interconnect on the chip. So, this is why they build such a big chip.
What is in these 400,000 kernels?
According to the company, the core of WSE is dedicated to artificial intelligence, but still programmable, which means that the chip is not limited to AI application. This is what they call the Sparse Linear Algebra (SLA) core. These processing units are dedicated to "tensor" operations for "artificial intelligence" work, but they also include a feature that reduces work, especially for deep learning networks. According to the company, 50% to 98% of the data in deep learning training is zero. Therefore, the non-zero data is "Sparse".
The SLA core reduces the workload by simply not multiplying anything by zero. The kernel has built-in data flow elements that can trigger calculations based on data, so when data encounters zero, no time is wasted.
How did they managed to do that?
The basic idea behind Cerebras's huge single chip has existed for decades, but it is also impractical.
Back in the 1980s, Gene Amdahl, a pioneer in parallel computing, developed a plan to accelerate
mainframe computing—a silicon-sized processor. In other words, most of the data is kept on the
processor itself rather than sending it to memory or other chips through the board. Such
calculations will be faster and more energy efficient.
With the $230 million borrowed from venture capitalists, Amdahl founded Trilogy Systems and
fulfilled his wishes. But we have to admit that the first commercial attempt at "wafer-level
integration" was a disaster. According to a report at the time, this attempt had successfully
introduced the verb "to crater" into the financial news dictionary.
The most basic problem is that the bigger the chip, the worse the yield. Logically, this should
mean that wafer-level chips will be unprofitable because your products will always be flawed. Cerebras' solution is to add a certain amount of redundancy. According to EE Times, the Swarm communication network has redundant links that allow the product to bypass the damaged core and work. According to the report, about 1% of the core is reserved.
Cerebras also must solve some key manufacturing constraints. For example, chip tools are designed to project their feature definition patterns onto a relatively small rectangle and repeat them perfectly on the wafer. Due to the cost and difficulty of casting different patterns at different locations on the wafer, this reason alone would prevent many systems from being built on a single wafer.
But WSE is like a typical wafer, composed of the same chipsentirely, just like what you usually make. The biggest difference is that they worked with TSMC and develop a method to establish connections between the spaces between chips,the space is called scribe lines. This space is
usually left blank because the chips are cut along those lines.
According to Tech Crunch, Cerebras must also invent a way to provide 15 KW power and cooling
system for the chip and create new connectors to handle the way it expands when heated.
Is this the only way to make a wafer level computer?
Of course not. For example, the team at UCLA and Illinois Urbana-Champaign is working on a
similar system that also built bare processorsand has been tested. They were installed on
patterned silicon wafer that has dense interconnected networks. This concept, called a silicon interconnect structure, allows these small chips to be closely connected (100 microns apart), which allows the characteristics of inter-chip communication to approximate that of a single chip.
"This is the study that we have been verifying all these times." said Rakesh Kumar from the University of Illinois.
Kumar believes that the silicon interconnect approach has some edges over Cerebras' single wafer level approach. First, it allows designers to mix and match technologies and use the best manufacturing processes for each technology. The monolithic approach means choosing the best process for the logic of the most critical subsystem, and use it in memory and other originals, even if it is not suitable for them.
Kumar suggests that in this approach, Cerebras can limit the amount of memory that it can put on the processor. "They have 18 Gigabits of SRAM on the wafer. Maybe this is enough for some models today, but what about tomorrow and the day after tomorrow?"
When will it come out?
According to "Wealth" magazine, Cerebras will ship the first batch of systems to customers in September. According to EE Times, some systems have already received prototypes. The company plans to announce the results of the complete system at the Supercomputing Conference in November.