NVIDIA Releases the Fastest Deep Learning System Yet


At this year’s GPU Technology Conference, NVIDIA announced the DGX-1, a new product billed as “the world’s first deep learning supercomputer in a box.” We sat down with Jim McHugh, who is the executive in charge of the product. McHugh brings a software background to the hardware company, having previously led development of the Solaris Operating System at Sun Microsystems.

NVIDIA has placed an enormous bet on this initiative, spending almost $2 billion to develop the DGX-1 and the associated Pascal chipset. But, in Tractica’s analysis, it’s a bet that is likely to pay off.

What is the DGX-1?

The DGX-1 is a purpose-built deep learning computer featuring integrated hardware and software. It can make floating point calculations at the rate of 170 trillion times a second. The DGX-1 ships in June in the United States and in 3Q 2016 for the rest of the world, with a price point of $129,000.

How Does It Work?

At the heart of the computer are eight Tesla P100 graphics processing unit (GPU) accelerators, 16 gigabytes of memory per accelerator, NVIDIA NVLink, and a deep learning software development kit (SDK). To achieve performance, the DGX-1 architecture combines eight Tesla P100 GPUs in a hybrid cube mesh. Memory bandwidth is 3 times faster compared to the previous NVIDIA Maxwell architecture, while interconnect bandwidth is now 5 times faster using NVLink. NVLink offers connectivity in both directions, allowing users to scale applications across multiple central processing units (CPUs) and GPUs. NVIDIA also developed new artificial intelligence (AI) algorithms, which can run at 21 trillion times a second. Energy efficiency has been improved by developing 15.3B transistors, fabricated with 16 nanometer finFET which McHugh describes as the best ever achieved.

The DGX-1 system uses a plug-and-play approach, which is attractive to enterprise companies who aspire to utilize deep learning but don’t have the experience of companies like Google, Facebook, and Microsoft that have been rolling their own systems. The software system includes libraries for deep learning primitives, linear algebra, sparse matrices, multi-GPU communications, as well as the complete CUDA C\C++ development environment.

Included with the package is the NVIDIA deep learning SDK. This accelerates commonly used deep learning open source programs such as Caffe, CNTK, TensorFlow, Theano, and Torch.


(Source: NVIDIA)

Why Does This Matter?

NVIDIA is justifiably proud of its achievement, but why does it matter? Our point of view is that this is an important advancement in two ways. First, it expands the size of a practical deep learning system.  Second, it encourages innovation.

McHugh told us the company worked closely with the big public cloud providers during the product’s development. While they did not fund development, they are expected to be early customers of the DGX-1. One of the providers wants to build a deep learning neural net with 120 layers. This would be a neural structure with 2 to the 120th nodes or 1,329,228,000,000,000,000,000,000,000,000,000,000,000. This would not be a feasible size with any slower system.

The DGX-1 encourages innovation by cutting large neural net training time down from several weeks to just a few hours.  Using an industry benchmark neural network, training performance is up to 12 times faster than NVIDIA’s previous best.  If a job takes several weeks to run, the neural net’s developer is likely to be very conservative in terms of goals and methods. After all, the result of any neural net is not a given. If the job is completed in a few hours, it is possible to experiment and innovate. It is just human nature.

Will This Bet Pay Off?

With the interest of the big public clouds and leading enterprises, commercial success is very likely for the DGX-1.  Since deep learning is data agnostic, many other sectors can benefit by using this technology, as well. NVIDIA has identified life sciences, energy, and automotive (driver assistance technology) as early adopters, but the list is not likely to stop there.  Image recognition, speech recognition, translation, and natural language processing (NLP) all benefit from faster training times. Low-throughput image enhancement, high-resolution virtual reality (VR), and pricing of financial options were all use cases mentioned at this conference.

According to Tractica’s research, since much of the software being used to build deep learning structures are open source, sales of hardware are likely to be a significant percentage of the budget spent on software. In our recently published Deep Learning for Enterprise Applications report, we forecast that spending on GPUs as a result of deep learning projects will grow from $43.6 million in 2015 to $4.1 billion by 2024.


Comments are closed.