# Thoughts on Nvidia’s Kepler and Maxwell GPUs

Yesterday Randall posted on the Nvidia product roadmap. Let’s take a little bit deeper look at the new GPU chips that were announced. However, before we do that, let’s also take a look at the latest GPU chip to help estimate what we could see in these future chips.

Tesla was named about Nikola Tesla. Fermi was named after the physicist Enrico Fermi. We know a lot about the Fermi GPU since it is publicly available. Fermi has 3.0 billion transistors, and is manufactured by TSMC in a 40 nm process. The Tesla C2050 / C2070 GPU Computing Processor can do 515.2 GFlops and consumes about 247 watts of power. The only difference between the C2050 and the C2070 is the amount of memory. The C2050 has 3 GB and the C2070 has 6 GB. That makes the GFlops per watt ratio for the C2070 to be approximately 2.0, and the C2050 to be approximately 2.16. Taking a look at the chart, the C2070 coincides with the top line of the Fermi chip.

The next GPU from Nvidia is code named Kepler, which is named for the mathematician Johannes Kepler. Kepler will be released sometime in 2011, and will be manufactured on a 28nm process. If the top of the chip is the correct value, then Nvidia is estimating that the double-precision Gigaflops performance of 5.7 GFlops per watt. Using 250 watts to represent the maximum amount of electrical power that Nvidia can use, then the **Kepler C3070 will be able to compute at 1.425 Tflops in double precision**. Yes, I made up the name C3070. I just followed Nvidia’s standard naming convention, which will likely change. That makes Kepler about 2.7 times faster than the Fermi C2070.

The follow-on GPU to Kepler will be the Maxwell which is named for the mathematician James Clerk Maxwell. Maxwell will be released sometime in 2013, and will be manufactured on a 22nm process. If the top of the chip is the correct value, then Nvidia is estimating that the double-precision Gigaflops performance of 15.7 GFlops per watt. Using 250 watts to represent the maximum amount of electrical power that Nvidia can use, then the **Maxwell C4070 will be able to compute at 3.925 Tflops in double precision**. Again, I made up the name C4070. That makes Maxwell about 7.6 times faster than the Fermi C2070.

Let us further suppose that we want to know how much computing this would give us in a rack, and use that to build a high performance computer.

Right now in a 42U rack you can place 84 C2070s which uses 21 KW of electrical power. In a rack full of Fermis, the graphical compute power would be approximately 43 Tflops. In a rack full of Keplers, the graphical compute power would be approximately 116 Tflops. In a rack full of Maxwells, the graphical compute power would be approximately 329 Tflops. This back-of-the-napkin design is ignoring the compute power of the Intel or AMD processors.

So let’s add back in the compute power to see what we get.

Intel Xeon X5680: 3.33 GHz * 4 ops/clock * 6 cores * 2 sockets = ~160 GFLOPS per 1U

In a 42U rack we could then have 6.72 Tflops of compute power at a cost of 10.9 KW of power. The total compute/graphics power today would be about 50 Tflops at a cost of about 40 KW of electrical power (once you add in networking and miscellaneous gear). A petaflop machine would be about 20 racks. With Kepler, a petaflop machine would be about 9 racks. With Maxwell, a petaflop machine would be about 3 racks. This ignores any improvements in Intel’s CPUs, of course.

An exaflop machine using Maxwells would be about 3,000 racks and consume about 120 MW of electrical power. Given that 20 MW is the design constraint on such a machine, we still have a ways to go before reaching an exaflop machine. Such a machine is estimated to arrive by 2019.

Now, as we all know, peak compute performance is that number that the salesman promises you that you will never reach. The current efficiency for hybrid CPU/GPU HPC machines is in the 30% ~ 35% range. Somehow I do not see a 9,000 rack machine consuming 360 MW of electrical power arriving at any location any time soon. Coming back to reality, you can see where petaflop machines will become a commodity in the very near future.