NVIDIA’s Latest Offerings Bolster Its Position in Deep Learning Inference


Santa Clara-based NVIDIA is widely recognized throughout the industry for having the fastest chips for training deep learning convolutional neural networks (CNNs).  At NVIDIA’s GTC China conference, CEO Jen-Hsun Huang announced two new products, the Tesla P4 and P40, that will extend the company’s technology into another important artificial intelligence (AI) area: neural net inference. As the name might suggest, graphics processing units (GPUs) were developed to improve the computer graphics experience by offloading certain computationally intense image processing tasks from the central processing unit (CPU). This makes them good for training deep learning algorithms. When it comes to actually running the neural net, the performance requirements are much different. Inferring a pattern using new data requires fewer computations that can be less precise but must be faster and use less energy than training does.

Why There Is a Need

Deep learning AI services such as voice-based language translation, email spam filters, and product recommendation engines are rapidly growing in complexity. As of this writing, many require up to 10 times more compute resources than similar versions did only a year ago. A lack of real-time responsiveness leads to a poor user experience and slows the pace of adoption.

How It Works

The particular strength of a GPU is performing large numbers of parallel floating point calculations. This helps computer displays to increase the level of detail and complexity without sacrificing system performance, and modern gaming would not be possible without it. The two new products, the Tesla P4 and P40 GPU accelerators, along with their associated software, are specifically designed for inference. Based on the Pascal architecture, these new GPUs execute specialized inference instructions in 8-bit integer registers instead of 32-bit floating point registers.  Although the results are less precise, they are much, much faster. NVIDIA claims they deliver a 45x faster response than CPUs and a 4x improvement over the last generation of GPUs.


A self-driving car or a self-navigating drone must recognize potentially dangerous patterns in real time. For a variety of safety reasons, it cannot send the information into the cloud and wait for the response. The delay may be fatal. For this reason, neural nets must respond more quickly, but also use less energy in field devices than in the cloud data centers. The result is an engineering compromise that requires a lot of finesse to maintain performance within safety parameters.

What the Future Holds

In Tractica’s recent Artificial Intelligence Market Forecasts report, we forecast that global AI-driven revenue for GPUs will grow to $14.2 billion by 2025. This opportunity has not gone unnoticed in the industry. Recently, Intel acquired two companies to jumpstart its deep learning capabilities: Nervana and Movidius. Other competitors like Qualcomm and Xilinx are also taking aim at the inference market.   With the release of the Tesla P4 and P40 GPU accelerators and new software, however, NVIDIA remains the company to beat in deep learning.


Comments are closed.