fermi-doubleprecisionNVidia has published a whitepaper of their upcoming Fermi (GT300) architecture detailing the new features and various case-study results.  The cliff-notes version:

  • 32 CUDA Cores per Streaming Multiprocessor
  • 8x double precision performance over GT200
  • Unified Address Space with full C++ Support
  • Full IEEE 754-2008 32-bit and 64-bit Precision
  • Predication
  • ECC Memory SUpport
  • 10x faster context switching
  • Out-of-Order thread block execution
  • 3Billion Transistors
  • 768K L2 Cache (didn’t exist in GT200)
  • Up to 16 concurrent kernels (Didn’t exist in GT200)

Looks like Fermi is going to be an awesome processor to work with. Read the full report on their site.

NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf (application/pdf Object).