NVIDIA’s Ampere will merge inference and training workloads in the data center and cloud


At the GTC 2020 Keynote, NVIDIA announced its latest compute platform for enterprises, Ampere. This platform is a giant step up from the previous generation, as detailed in NVIDIA’s A100 Tensor Core GPU Architecture white paper. While the company achieved many impressive feats, a few chipset specs jump out and have the potential to disrupt the AI chipset market.

Performance boost largely for inference

The most drastic change is the increase in inference performance in the A100 chipset compared to the V100 chipset. The inference compute of the A100 at the 8-bit integer data (INT8) format has increased by 10x over the V100 to 624 TOPS via Tensor Core (Tensor Cores perform matrix multiply-accumulate calculations for a given data format). The V100 was not exactly optimized to run inference workloads. Although it offered impressive 116 Tensor FLOPS, its integer performance was 62 TOPS. Inference rarely uses floating-point data types in production and hence the V100 was restricted to primarily training workloads. The change from 62 TOPS to 624 TOPS is 10x. In comparison, the training performance of the A100 has seen a modest increase (although via a different data format) from 125 TFLOPS to 156 TFLOPS.

NVIDIA has created a separate, low cost product line for inference, with the T4 being the state of the art. Since the inference pipeline has more or less become standard at the INT8 format, the T4 was optimized for 8-bit inference with 120 TOPS INT8 performance.

Impact on cost per compute

The biggest difference between training and inference users is their willingness to pay for compute. While the training system budget comes from the R&D budget, inference systems fall under IT’s opex budget. The inference chipset user’s primary concern is the price per inference. If you consider the T4 list price of $3,000 per card, the price of single INT8 compute comes to $23/TOPS (see table below). While the price for the A100 chipset has not been officially announced, assuming that it’s similar to the V100 (~$12,000), the price per inference drops below that for the T4 to $19/TOPS.

Moving forward, NVIDIA’s A100 will stand out as a solution for inference. This chipset will be valuable to companies that are looking to maximize their resource utilization in the data center. If the resources are not utilized for training, companies can allocate them for inference. Cloud companies and hyperscalers in particular will relish this capability. The chipsets can be offered to a wider range of users to run both inference and training workloads, thereby maximizing their revenue potential. The virtualization software NVIDIA introduced with Ampere can facilitate this value even further.

The T4 will live on

This doesn’t mean that the T4 product line will end. It will most likely find home in other products, particularly for edge inference. The compute requirements for inference workloads on the edge are on the rise, and the T4 will fill in that gap. The emergence of edge cloud and 5G will also facilitate the use of edge servers and workstations, providing opportunities for T4-like chipsets. Omdia expects that edge workloads will primarily be inference until suitable training frameworks and applications emerge. Once that happens, T4-like chipsets will also need floating-point data paths for training.

AI compute continues to scale

The AI chipset market is large, and the only constant in the AI world is change. We have seen rapid advances in the past three years, jumping from 10 TFLOPS (P100) to almost 1 PetaOPS (rated capacity for the A100 with sparse INT4). Demand for AI compute in the enterprise market is increasing, and many semiconductor product lines are emerging across the industry for different applications.

Comments are closed.