The developers of CULATools, a linear algebra library specifically designed and optimized for CUDA architectures, got their hands on the Fermi-based NVidia Tesla C2050 and put it through the paces.  With almost no changes to the code (only a few compiler flags, no code changes), they got huge performance gains.

As you can see, Fermi is no slouch! We’re reporting performance gains for doubles up to 3x over the previous generation of Tesla GPUs. It’s also very important to note that these gains are achieved with no Fermi-specific optimizations added — these are practically plug-and-play performance enhancements. We have every expectation that with a little time and effort we can improve significantly upon these already impressive numbers.

via CULAtools – Initial Fermi Performance.