The term “application specific integrated circuit” (ASIC) became popular in the 1990s when such chips promised to bring down the cost per chip for a given application, such as mobile phones or Ethernet cards. An ASIC, by definition, meant developing hardware to solve a problem by building gates to emulate the logic. These chips offered little programmability, but provided maximum performance at a given power and cost budget.
The Evolution of Deep Learning ASICs
It is hard to trace the origins of the word ASIC and its use to describe deep learning chipsets. Perhaps the deep learning chipset industry started using the word ASIC to differentiate itself from the rest of the chipsets. The ASIC movement for deep learning is entirely driven by startups and is making headlines of late. Nervana Systems was recently acquired by Intel for $482 million. In addition, Graphcore announced that it had raised $30 million in capital and Wave Systems came out of stealth mode announcing its product line. However, ASIC is not a word that describes these chipsets. The deep learning chipsets are essentially programmable architectures designed for deep learning applications.
Deep learning algorithms are highly computationally intensive. A convolutional neural network (CNN), for instance, requires convolution operation repeated throughout the pipeline and the number of operations can be extremely huge for 1080p or 4K images. These algorithms also tend to be highly parallel, requiring data splitting between different processing units and making it important to connect the pipeline in the most efficient manner. In addition, there is significant transfer of data back and forth between memory. The deep learning chipsets are designed to tackle these aspects and optimize performance, power, and memory.
All of these ASICs provide some sort of software engine to run a deep learning framework. Nervana developed its own framework called Neon, whereas Graphcore and Wave Computing both support TensorFlow. The ASIC companies are also developing their own boards and boxes that can be plugged into servers with minimal modifications. Application developers can write a deep learning algorithm, set some compile time options, and continue to develop software just as they would on the central processing unit (CPU), completely oblivious to the underlying hardware.
Awaiting Application-Specific Functionality
So, ASIC really is a misnomer to describe these chipsets. If the deep learning algorithm were frozen to 10 layers with additional blocks in between and then synthesized onto gates, that would be a true ASIC. That ASIC would then be dedicated to a particular function, such as locating an object in the sky or reading a road traffic sign. The ASIC would have very application-specific functionality and would offer only a few adjustable parameters. It would not have a software engine to make it useful for other applications, such as pedestrian detection. Due to the ever-changing nature of deep learning algorithms, we are not quite ready for such ASICs today.
One could argue that these chipsets are next-generation graphics processing units (GPUs) or architectures that are highly optimized for deep learning applications. While we continue to see huge improvements in deep learning algorithm performance via such chipsets, we are still a few years from coming up with true application-specific ASICs for deep learning applications.
Market Readiness in the Future
Currently, deep learning algorithms are also highly evolving and no single neural network has been hardened to solve a single problem. Video decoder algorithms also started in a similar fashion, eventually evolving into chipsets that synthesized entire algorithms into gates. Even today’s market is not ready to use a fully hardened deep learning ASIC. In time, we will see such hardening and perhaps some standardization of neural network architecture leading to fully synthesized ASICs.