Nvidia Ampere Architecture: What to Expect?

I wanted to write an article on Nvidia’s upcoming GPUs from the last couple of months.

But there was hardly any information or leaks to start with. In fact, I have more details on AMD’s Zen 3 & Big Navi than Ampere.

I’m expecting the first details on Ampere (or even the product launch) to surface at GTC 2020 but I decided to go with this article anyway and see how close I end up with my predictions.

Disclaimer: Most of the information in this article is based on rumors and speculation and hasn’t been officially confirmed by Nvidia. 

    Rumors & Leaks

All the Rumors
  • Manufactured using TSMC’s 7nm (N7) Process
  • Some chips could be fabricated using Samsung’s 7nm EUV Node instead
  • Extremely Power Efficient
  • GDDR6 Memory is expected. HBM2/HBM2E is less likely
  • Faster & higher-bandwidth GDDR6 Memory
  • Higher VRAM
  • Higher Clock Speeds (100-200 MHz higher)
  • PCIe Gen 4 Support
  • Re-design & Improvements in Ray Tracing Cores (RT Core Advanced)
  • Massive Improvement in Tensor Cores
  • Double the FP32 Unit
  • Enhanced L1 Cache

Fabrication Process

Earlier, there were rumors pointing out that Nvidia will pick Samsung’s 7nm Node for Ampere. But later on, Nvidia confirmed that TSMC will be receiving most of the orders instead.

Nvidia will be using TSMC’s 7nm Process for most of the products in the Ampere lineup. This is the same Fabrication Node that AMD has used for Zen 2 & 1st Gen Navi.

Some other rumors are suggesting that Nvidia could possibly use TSMC’s N7P or the N7+ Node but that is less likely.

Some of the lower-end GPUs might utilize Samsung’s 7nm EUV Node instead. Earlier in the Pascal lineup, Nvidia did use Samsung’s 14nm Process for lower-tier cards like GTX 1050/1050 Ti (GP107) & GT 1030 (GP108) while the rest of the lineup was on TSMC’s 16nm.

The decision to use both TSMC & Samsung could have something to do with the possibility of supply issues getting worse as we move closer towards the launch of Xbox Series X & Playstation 5.

TSMC’s 7nm HPC Node offers twice the density of their 12nm Node alongside massive improvements in both the performance & power efficiency. You can expect Nvidia to launch more GPUs that won’t require any PCI-E Power Connector.

Power Efficiency

Graphics Cards are about to get a lot more Power Efficient in the next 6 to 12 months. Nvidia will achieve that by shrinking their chips to 7nm while AMD is looking to do the same with architectural improvements on RDNA2.

Here’s my estimation on the Power Efficiency of Ampere.

In our gaming tests, the RX 5700 consumed 40% less power than the RX Vega 64 while being 5% faster. The difference in Vega (GCN) & Navi (RDNA) architectures also played a role in it but its contribution was not as significant as the die shrink.

I’m taking the example of these two specific Graphics Cards because they’re close in performance and the shrink from GlobalFoundries 14nm Node (for Vega 64) to TSMC’s 7nm (for RX 5700) is quite similar to the shrink from TSMC’s 12nm (Turing) to TSMC’s 7nm (Ampere).

The other efficiency improvements will be a result of the improvements in the microarchitecture.

Based on the above results, I’m expecting a 45-70% reduction in power consumption for the same performance. But as every SKU will also be significantly faster, on average you can expect an Ampere card to consume 30-45% less power than its Turing counterpart.

Other Details

Nvidia is likely to use PCIe Gen 4 in the Ampere lineup. It will be interesting to see if higher-end Ampere cards perform any different on PCIe 4.0 systems as compared to PCIe 3.0. Even a minor difference could mean trouble for Intel as the upcoming 10th Gen Comet Lake-S CPUs are likely to stick with PCIe Gen 3.

With Turing, Nvidia was so busy over-hyping Ray Tracing even though the performance hit was massive. With Ampere, the gaming experience with Ray Tracing turned on is about to get a whole lot better.

According to some rumors, Nvidia is redesigning the RT Cores and they will be significantly faster. The new cores might be named RT Core Advanced.

GDDR6 is expected but there is also a slim possibility for HBM2 or HBM2E for higher-tier cards. We can also expect Nvidia to use faster (16 Gbps?) and higher-bandwidth GDDR6 Memory on the flagship cards.

Most of the SKUs will have Higher VRAM. Also, expect the Tensor Cores to be remarkably faster. Higher VRAM & faster Tensor Cores will even allow even mid-range Ampere cards to perform exceptionally well in Deep Learning. If that turns out to be true, expect more researchers & developers to prefer training the models on their own systems than leveraging the power of cloud services.

Rumoured Specification

A Twitter user, KittyCorgi has leaked some of details of GA103 (RTX 3080) & GA104 (RTX 3070).

 

 

Ampere GA103 Diagram

Ampere GA103

Ampere GA104 Diagram

Ampere GA104

Image & Leak Credits: KittyCorgi on Twitter

The RTX 3080 is expected to have 3,840 CUDA Cores which is 30% higher than RTX 2080. It will feature a 320-bit Memory Bus and 10GB GDDR6 Memory. While the RTX 3070 is expected with 3,072 CUDA Cores, 256-bit Memory Bus and 8GB of GDDR6 Memory.

GPU SKU Name CUDA Cores Memory Size Memory Bus Width
GA102 RTX 3080 Ti 384-bit
GA103 RTX 3080 3,840 10 GB GDDR6 320-bit
GA104 RTX 3070 3,072 8 GB GDDR6 256-bit

Earlier, some of the rumors were also pointing out that the VRAM on these cards will be 16GB (for RTX 3080 Ti) and 12GB (for RTX 3080 & 3070) instead.

    Speculating the Performance

Before estimating the performance, let’s have a look at the performance of the past four generations of Nvidia’s mainstream flagship cards.

GPU Name GTX 780 Ti GTX 980 Ti GTX 1080 Ti RTX 2080 Ti
Fabrication Process TSMC’s 28nm TSMC’s 28nm TSMC’s 16nm TSMC’s 12nm
Die Size 561 mm² 601 mm² 471 mm² 754 mm²
Transistor Count 7.1 Billion 8.0 Billion 12.0 Billion 18.6 Billion
CUDA Cores 2880 2816 3584 4352
Base/Boost Clock Speed 875 MHz / 928 MHz 1,000 MHz / 1,075 MHz 1,480 MHz / 1,582 MHz 1,350 MHz  / 1,545 MHz
Memory Type/Bandwidth GDDR5 / 336 GB/s GDDR5 / 336 GB/s GDDR5X / 484 GB/s GDDR6 / 616 GB/s
Architecture Kepler Maxwell Pascal Turing
Launch Q4 2013 Q2 2015 Q1 2017 Q3 2018
MSRP $699 $649 $699 $999
Relative Performance (Gaming) 100% 154% 256% 341%
Performance Gain (From Last Generation) 54% 66% 33%
Equivalent To (Gaming Performance) GTX 1060 3GB GTX 1660 Ti Between RTX 2070 Super & RTX 2080

The performance gain on the last generation was decent but not as great as the previous two generations. One of the primary reasons for that was the fact that Nvidia was more focused on Ray Tracing than the raw performance. Another limitation was the physical size of the chip that drove up the manufacturing costs and significantly increased the chances of defective dies in the wafer.

Remember, TSMC’s 12nm Node is not a die shrink from 16nm, it is just a process refinement of 16nm and that is why Nvidia had to make a massive chip (RTX 2080 Ti) in order to increase the Cores and add new features that require RT Hardware & Tensor Cores.

Moving from TSMC’s 28nm on Maxwell to 16nm on Pascal offered nearly twice the transistor density. Moving from 12nm to 7nm should again result in a similar density increase.

The performance of Ampere will be driven by three major factors:

  • Architectural Improvements
  • Higher Frequency due to 7nm Process
  • Higher SMs due to Higher Transistor Density

While it is hard to predict the performance improvement without knowing anything about the architecture, I expect the performance difference between RTX 3080 Ti & 2080 Ti to be larger than that of 2080 Ti & 1080 Ti simply because the shrink from 12nm to 7nm will be a massive contributor in itself.

I’m estimating the RTX 3080 Ti to be 35 to 45% faster than the RTX 2080 Ti at Gaming.

The performance-per-watt metric is going to be even more impressive.

    Launch Date & Pricing

Nvidia could introduce Ampere in March at GTC 2020. We might also get the full specification for RTX 3080 Ti and the RTX 3080 in the same event alongside the first performance numbers.

The other major events where the launch is possible are ComputeX 2020 (June), E3 (June), or Gamescom (August).

When it comes to product availability, I’m guessing late Q2 or Q3.

The RTX 3070 & 3060 should follow up next and these cards should be available by late Q3 or early Q4.

Next up, let’s talk about the pricing. While 7nm costs significantly higher than 12nm, the reduction in chip size & AMD’s RDNA2 threat should keep the pricing similar to Turing for low-end and mid-range cards. The pricing for higher-tier cards could be slightly reduced or it could stay the same.

The current (Nov 2019) known defect rate of TSMC’s 7nm Node is 0.09 and that is very close to their 12nm process. The reduction in die size should help compensate for the defects even further.

If the die size of GA102 turns out to be close to 500 mm², I’m expecting the price tag for RTX 3080 Ti to be around $899. And for the RTX 3080, I’m expecting $649.




Leave a Comment