Tensor Cores Explained: Do you need them?

You might have come across the term “Tensor Cores”, if you’re into AI/Machine Learning or if you’re a Tech/Semiconductor Industry Enthusiast.

There is a lot of confusion regarding what they are and how they are related to GPUs. At Tech Centurion, we believe in simplifying even the most complex tech for you. So here is an article with de-complexifies Tensor cores.

Essentially Tensor cores are processing units that accelerate the process of matrix multiplication. It is a technology developed by Nvidia for its high-end consumer and professional GPUs. It is currently available on limited GPUs like the ones belonging to the Geforce RTX, Quadro RTX, and Titan family. It can offer improved performance in AI, gaming, and content creation.

This results in faster Deep Learning/AI Performance and speeds up Neural Network training.

Before we get into the discussion about what Tensor cores are, let us first have a look at what Tensors are. Having a rough idea about the Tensors is required to understand more about the Tensor cores.

    What is a Tensor?

Tensor is a data type that can represent all types of data. It is challenging to find an intuitively satisfying definition of a Tensor. Still, it can be thought of as a container in which a multi-dimensional data set can be stored.

If that still doesn’t ring any bells, think of them as a cookie jar containing cookies [real cookies, not browser cookies] and packets of cookies. Cookies represent data, and the cookie jars represent a tensor. Cookies are kept in a pack, which is similar to ordered groups of data.

A somewhat accurate representation of a tensor is shown here:

Tensor Explained

A lot of people try to define Tensor with the help of physics. They make them appear as a generalization of scalars and vectors.

But a Tensor is not a generalization of scalars or vectors, but rather, scalars and vectors are a generalization of a tensor.

It can be considered as an extension of a matrix. Matrices are two-dimensional structures containing numbers, but a tensor is a multidimensional set of numbers.

    Nvidia’s Tensor Cores

All microprocessors are designed to carry out arithmetic and logical operations. One arithmetic operation that holds high importance is matrix multiplication. Multiplying two 4×4 matrices involves 64 multiplications and 48 additions. Convolution and Multiplication are the areas where the new cores shine.

The computational complexity increases multifold as the size and dimensions of the matrix (tensor) go up. Machine Learning, Deep learning, Ray Tracing are tasks that involve an excessive amount of multiplication.

Nvidia first came up with this technology when they released GPUs based on Volta architecture. Nvidia Geforce Titan V was the first GPU to feature the cutting edge tech. Back then, it was the strongest (and costliest) GPU ever made.

Nvidia Titan V Volta

Sadly, these new Tensor cores were not available in any Geforce series GPUs. It was only when Nvidia released Turing architecture-based Geforce RTX GPUs, tensor cores were made available to consumers. These GPUs were also much stronger than the previous-gen GPUs based on the Pascal architecture.

Watch Nvidia’s Michael Houston explain Tensor cores in a video posted by Nvidia Developers YouTube channel:

    Tensor Cores vs. CUDA Cores

CUDA cores have been present on every single GPU developed by Nvidia in the past decade while Tensor Cores have recently been introduced.

Tensor cores can compute a lot faster than the CUDA cores. CUDA cores perform one operation per clock cycle, whereas tensor cores can perform multiple operations per clock cycle.

Everything comes with a cost, and here, the cost is accuracy. Accuracy takes a hit to boost the computation speed. On the other hand, CUDA cores produce very accurate results.

For machine learning models, CUDA cores are not as effective as Tensor cores in terms of both cost and computation speed. Hence, these are the preferred choice for training machine learning models.

If you are a developer trying to learn in-depth about this technology, check out these posts by Nvidia’s official blog for developers. Nvidia’s blog for developers includes dozens of posts on this topic.

    Applications of Tensor cores

Fundamentally, all Tensor coreS does is that it accelerates the speed of matrix multiplication. Their applications are practically unlimited. But here are some of the most prominent ones:

Artificial Intelligence and Machine Learning

AI is the present and AI is the future. Deep learning involves handling a massive collection of data. If you are familiar with machine learning basics, you would know how the data set is passed through multiple layers of neural networks. This is nothing but tons of matrix multiplication.

Nvidia’s GPUs power several workstations today. Most of the supercomputers are powered by Nvidia GPU these days and this helps Computer Engineers make use of this technology.

Self Driving Electric Vehicles

Electrical and Computer Engineers working on next-gen vehicles can use tensor cores. For performing simulations of electrical power converters and training the self-driving algorithms, Nvidia GPUs are an ideal choice.

Media and Entertainment

High-performance computers can come in real handy in 4K content creation. Creating 4K graphics and videos requires some serious computational power.

Academics and Research Labs

Universities working on AI and ML algorithms need to simulate their models. Having a platform that can accelerate simulations can do wonders.

Academicians need to train and test their models, which may take hours or days together. To keep up with conference deadlines, having a high-performance computer can speed up the process.


What if you are not a computer science engineer who is training machine learning algorithms every day? Is there anything you should be concerned about?

The answer is yes.

Nvidia’s RTX series GPUs support another brand new feature by Nvidia called DLSS (Deep Learning Super Sampling). DLSS uses deep learning algorithms to render graphics at low resolution and upscale it to higher by denoising. With DLSS turned on, your PC can render games at say 1080p and upscale it to a resolution of 1440p.

Of course, the quality would not be as good as native 1440p, but the FPS boost you get with DLSS makes it worthwhile. It can be said that DLSS 1.0 wasn’t great; a lot of details were lost during upsampling. But since Nvidia introduced DLSS 2.0, things have changed in the right way.

The technology has been around for quite some time now, but only a handful of games support DLSS or even Ray Tracing. Control, Shadow of the Tomb Raider are some of the games that do support DLSS and Ray Tracing.

Upcoming titles like Call of Duty Black Ops: Cold War, Cyberpunk 2077, and Watchdogs Legion will support this tech as well. It is getting increasingly difficult for previous-gen hardware to keep up with the demands of newer games.

If you’re interested in checking our GPU Benchmarks, you should look at out Graphics Card Rankings.

With the announcement of Nvidia’s Real-time Ray tracing gen 2, maintaining both 4K resolution and 60 FPS has become a challenge. Only the best GPUs can comprehend this fast-growing requirement.

Ray Tracing is a super-intensive process. To deliver playable FPS with RTX turned on, the game developers have to put a huge amount of work to optimize the games. The addition of denoising algorithms adds to the burden. Tensor cores will aid the Ray tracing cores are expected to improve AI de-noising in the future.

Though most of these processes are still being carried out on CUDA cores, the Ray Tracing Cores and Tensor Cores will soon play an essential role in the process.

    Specialized ASICs for machine learning

General-purpose processors are known to offer poor performance for machine learning and deep learning applications. There has been an increase in interest in developing ASIC (Application Specific Integrated Circuit) for Artificial Intelligence applications. Google is one such company that has developed its own tech for this purpose.

Google’s TPUs

Google has developed TensorFlow Processing Units (TPUs) with a similar purpose. The second generation of Google’s TPUs is called Cloud TPUs. These processors are built specifically for training neural networks. They are application-specific.

The search giant has provided a cloud-based platform that lets users run their training models on the cloud. The user doesn’t need to own a TPU to use its functions; Google allows anyone to rent a TPU for a fee and run machine learning algorithms.

Nvidia doesn’t see it as a threat as long as Google uses these TPUs for their own applications. Still, Google not using Nvidia’s processors does have a significant impact on Nvidia’s business as Google could have been a big customer if not the most prominent customer ever. The day Google starts selling these chips, it would give a tough competition to the oldest GPU maker.

    Are they worth the hype?

It is tough to provide a black and white answer to this question. Nvidia’s Geforce RTX 3000 series GPUs offer outstanding performance in AI applications. It makes perfect sense to buy one if you are working in this field. Google’s own AI platform Tensorflow supports acceleration with the help of Tensor cores.

Given that they offer features available in Nvidia’s Quadro lineup at a fraction of their price, Geforce RTX cards are a pretty good choice for Machine Learning and AI enthusiasts.

Another way one can take advantage of Tensor Cores is through a Cloud provider such as Amazon AWS or Google Cloud. This is helpful to those who don’t want to invest in a high-end PC.

Even after years of release, Tensor cores are found on a small number of high-end consumer graphic cards. GPUs like the Geforce 16 series still don’t have them. Intel and AMD don’t offer them in their GPUs. However, they will likely develop an alternative to Nvidia’s Tensor Cores in the upcoming years.

With these RTX GPUs, small businesses, engineers, and students can start studying and experimenting with machine learning. In the future, we might even begin to see ASIC for handling machine learning-related tasks.

Leave a Comment