Nvidia CUDA Cores Explained: How are they different?

Nvidia calls its parallel processing platform CUDA. While CUDA Cores are the processing units inside a GPU just like AMD’s Stream Processors.

CUDA is an abbreviation for Compute Unified Device Architecture. It is a name given to the parallel processing platform and API which is used to access the Nvidia GPUs instruction set directly.

Unlike DirectX and OpenGL, CUDA does not require developers to understand the complicated graphics programming language; it just works on popular programming languages like C, C++, and proprietary Nvidia technology.

    CUDA Cores

If you have ever looked at the specs sheet of your Nvidia graphic card, you may have definitely come across this term.

Let us consider an example to understand the working of CUDA cores. Think of the processor as a water tank. If you want to empty the tank, you will need to make use of pipes.

If you connect more numbers of pipes, then naturally you will be able to empty the tank faster. CUDA cores act like these pipes to the processor. More number of CUDA cores means that the processing can be done at a much faster rate.

Modern Nvidia GPUs come with three different types of processing cores:

  • CUDA cores
  • Tensor cores
  • Ray-Tracing cores

Each core is designed for a very specific purpose. Ray-Tracing cores are exclusive to Nvidia RTX graphics cards, and there are no GPUs from AMD that have Ray Tracing Cores. However, AMD’s upcoming RDNA 2 based GPUs will support Hardware Accelerated Ray Tracing.

If you want to know about Tensor Cores, you can check out this post on tensor cores to learn about them.

Graphic cards starting from Tesla architecture had CUDA cores in them. All the Nvidia GPUs belonging to Tesla, Fermi, Kepler, Maxwell, Pascal, Volta, Turing, and Ampere have CUDA cores. But the same can not be said about the Tensor cores or Ray-Tracing cores.

The first Fermi GPUs featured up to 512 CUDA cores, each organized as 16 Streaming Multiprocessors of 32 cores each. The GPUs supported a maximum memory of 6GB GDDR5 memory. Here is a block diagram which shows the structure of a fermi CUDA core. Each CUDA core had a floating-point unit and an integer unit.

Nvidia CUDA Core

And the next figure shows the block diagram of Fermi GPUs.

Nvidia Fermi Architecture

With the Kepler architecture, the CUDA cores count was tripled. Kepler architecture supported up to 1536 CUDA cores. It was based on the 28nm fabrication technology.

Nvidia Kepler Architecture Block Diagram

From here on, Nvidia kept slapping more CUDA cores with each passing generation. Here is what the block diagram of Nvidia Quadro GP100 looked like. It was a part of Pascal architecture released by Nvidia in 2016.

Both the Maxwell and Pascal architectures had 128 CUDA cores per stream multiprocessor (SM). The integer unit was trimmed in maxwell architecture, removing the dedicated multiplication unit.

Nvidia GP100 Pascal Architecture Diagram

Nvidia’s Turing architecture brought a lot of changes to the GPUs. Here is what the block diagram of TU102 GPU looked like.

The number of CUDA cores per SM was reduced to 64 (from 128). Tensor cores and Ray Tracing cores were added. The fabrication process used was TSMC’s 12 nm. The integer and floating-point units were separated starting from the Turing architecture.

TU102 Turing Block Diagram

The most recent Ampere architecture brought 2nd Gen Ray Tracing Cores. The GA100 GPU had 128 SM. The Ampere GA102 has 10,752 CUDA cores. This time, each core consists of two FP32 processing units (Units which carry out 32-bit floating-point operations). 

Nvidia Ampere GA100 Block Diagram

The interesting thing about these CUDA cores is that it can handle operations on both integers and floating points. It means each CUDA core in Ampere architecture can handle two FP32 or one FP32 and one INT operation per clock cycle.

Here is a block diagram of GA102 GPU based on Nvidia’s latest Ampere architecture.

The next generation of Nvidia’s GPUs will most likely be based on the 5 nm fabrication process. This will shrink the die size further, reducing the power requirements and bumping up the clock speeds to over 2 GHz.

As the developers start understanding the newer architectures better, they can better optimize their games and programs to further boost the performance.

    How CUDA cores affect the performance?

Each Nvidia GPU contains hundreds or thousands of CUDA cores. When it comes to processing power, there are a lot of things that should be considered when judging a GPUs performance. GPU Clock speeds, GPU Architecture, Memory Bandwidth, Memory Speed, TMUs, VRAM, and ROPs are some of the other things that affect the GPU Performance.

VRAM holds the assets, textures, shadow maps, and all other data being processed via GPU. The reason why graphic cards store this data in VRAM is that it is much faster to access it from VRAM compared to DRAM, SSD, or HDD.

Many factors affect the amount of VRAM that your system requires (like resolution). Most modern graphic cards come with VRAM sizes ranging from 2GB to even 24GB VRAM (RTX 3090, we’re looking at you).

Coming to the clock speeds, there are two things that we need to discuss: the core clock and memory clock. The core clock is the speed at which the GPU operates. On the other hand, the memory clock is the rate at which GPU’s VRAM works. The core clock is similar to the processor’s clock speed, and the memory clock is similar to the speed of system RAM.

Most of the CPUs in the Mainstream market come with two to sixteen cores. This enables them to perform operations parallelly. When it comes to graphical calculations, there are a lot of things that need to be calculated parallelly. When you look at GPUs, what you call a core is essentially just a Floating Point Unit to CPUs.

A GPU core can not fetch or decode instructions but just carry out calculations. The number of CUDA cores is usually in the order of thousands of modern GPUs.

It is not possible to judge the performance of any graphic card based on only the number of CUDA cores. You have to take into account the graphic cards architecture, clock speeds, number of CUDA cores, and a lot more that we have mentioned above.

The number of CUDA cores can be a good indicator of performance if you compare GPUs within the same generation. The Nvidia GTX 960 has 1024 CUDA cores, while the GTX 970 has 1664 CUDA cores. The GTX 970 has more CUDA cores compared to its little brother, the GTX 960.

More CUDA scores mean better performance for the GPUs of the same generation as long as there are no other factors bottlenecking the performance.

Then things get a little tricky if you compare graphic cards of different generations and architectures. For example, Nvidia GTX 1070 has almost the same number of CUDA cores as a GTX 780, and the RTX 2060 has fewer CUDA cores compared to a GTX 780. This does not mean that the GTX 780 can beat the GTX 1070 or RTX 2060 in any way.

This difference in performance arises due to the different architecture, transistor size, and fabrication process between GPUs of different generations. The performance of the CUDA core depends a lot on the size of fabrication and GPU architecture. That is why a single CUDA core of the newer generation is much more powerful compared to its predecessor.

If we compare two GPUs of a different generation, the GTX 980 Ti based on the Maxwell architecture and the GTX 1080 based on the Pascal architecture, here is what we observe.

Nvidia Geforce GTX 980 Ti GTX Geforce GTX 1080
No. of Transistors  8,100,000,000 7,200,000,000
No. of CUDA cores 2816 2560
No. of Transistors/Core 2,876,420 2,812,500
Clock speed 1500 MHz 2000 MHz

From the above table, we can see that there is a small difference in the number of transistors / CUDA cores between the two graphics cards. If you are going by the numbers, you will conclude that the performance of both the graphics cards will be similar. However, that would be incorrect.

The small size of transistors in Maxwell architecture is what makes the difference here. Smaller transistors reduce the overall power consumption and let the manufacturers pack more transistors in a small die. All this leads to an increase in the maximum clock frequencies that the Pascal GPUs can hit, which increases its overall performance.

Here is a hypothetical equation that shows how the performance of a GPU depends on various factors.

Performance of GPU =  number_of_cores * clock_frequency * architecture_multiplier;

Instead of solving some convoluted equation to find out how good your GPU, it is always a better idea to look for real-world gaming or compute benchmarks. This will help you understand how the graphic card performs in real life. If you look at benchmark scores, you would know exactly what to expect from the graphic card before buying it.

Let us now summarise what we have understood until this point.

  1. More number of CUDA cores means more data can be processed parallelly.
  2. More clock speed means that a single core can perform much faster.
  3. The GPUs get better with new generations and architectures, so a graphic card with more number of CUDA cores is not necessarily more powerful than the one with lesser CUDA cores. 

    Stream Processors

A stream processor is AMD equivalent to Nvidia CUDA cores. Both the AMD and Nvidia GPUs excel when it comes to graphical performance. Although CUDA cores and Stream processors are not equivalent to each other, they are fundamentally different.

It will not be right to judge the performance of graphic cards just by looking at the number of CUDA cores and stream processors.

It is also interesting to note the fact that Nvidia has done a better job at maintaining developer support for their CUDA cores while AMD has always gone with their aggressive pricing and their GPUs offer better value for money.

    Conclusion

This is one of the hottest questions in the industry. Which one is better? Nvidia’s CUDA or AMD’s stream processors? Well, both the technologies have matured, and we are at a point where we can and directly put them against each other.

Though there is no visual difference between the technologies, they are very different. As someone who is only concerned about gaming, the price to performance ratio should be the main focus. AMD has come up as a really promising option when it comes to CPUs and GPUs in recent years. AMD’s Products offer great value for money and a performance that is top-notch.

By putting CUDA cores in every computing device from supercomputers, desktops, laptops, mobile devices to automobiles, Nvidia has made it easier than ever for developers to deploy their software on any platform they want to target.

No matter what happens, if you are a gamer, remember at all times, the graphic card which crunches out more FPS in AAA titles is always the winner.

If you want to know which is the Best Graphics Card for Gaming, check out our article.




Leave a Comment