nVidia’s Fermi at SC09

by on November 18, 2009

gt300-fermiIn case you do not know, Fermi is the code name for the next generation GPU from Nvidia. The Fermi chip consists of 3 Billion transistors and is manufactured at TSMC on a 40 nm process. By way of comparison, the previous generation of NVidia graphics cards was the GeForce 200 series. The GeForce 200 series consists of 1.4 billion transistors and is built on a 65nm process. Once can easily see that Fermi is more than twice the number of transistors of the previous generation.

By way of yet another comparison, ATI has already released its latest graphics card, the 5870. It consisted of 2.15 Billion transistors and is manufactured at TSMC on a 40 nm process. Upon the release of the 5870, ATI more than doubled the number of transistors from the previous generation, the 4890, which only had 959 Million.

NVIDIA’s Fermi architecture consists of 16 streaming multiprocessors (SM). Each streaming multiprocessor consists of 32 Streaming Procesors (SP). A Streaming Processor is now called a CUDA core. That gives Fermi a total of 512 CUDA cores. Each core can execute one floating point operation per clock. Each SM has 16 load-store units. These load-store units are used for memory operations. Each SM also has 64KB of first level cache, and 768 KB second‐level cache.

Fermi Unit in an HP Z800 at SC09

What Nvidia is not saying yet, publicly, is how fast the clock is. However, Nvidia is showing a demonstration program on the floor of Supercomputing 2009 in Portland, Oregon. On one HP Z800 computer, we have a Tesla C1060 running a CUDA n-body demonstration. This well-known CUDA program is running a 20,480 body interaction in double precision. There are 1.47 Billion interactions per second, and a frame rate of 3.5 frames per second. On the other HP Z800 computer, we have a Fermi running the same program at a rate of 9.11 Billion interactions per second, at a frame rate of 21.72 frames per second.

In other words, the Fermi GPU is running 6.2 times faster, whether you look at interactions per second, or frames per second. Nvidia says that Fermi can run double precision up to 8x faster. Double precision is running at half the speed of
single precision in Fermi. In the previous generation, the G200, double precision was running one-tenth of the speed. That is because in the G200, double precision was handled by one dedicated unit per SM, while it had eight single precision units (SP).

Will Fermi be faster in gaming that the ATI 5870? What about the 600 lb. gorilla from Intel called Larrabee? We do not know, for certain, yet. One thing is sure, it will be an interesting ride.