NVidia has today released the newest version of their popular CUDA Toolkit, version 3.2, that boasts all around performance improvements and several new features.   The new version includes a new Sparse Matrix library ‘CUSPARSE’ to offset the command CUBLAS and CULAPACK libraries that excel at dense matrices.  Also, they have a new GPU-accelerated random-number library ‘CURAND’.  GPU accelerated random numbers may seem a bit pointless at first glance, but random number entropy is a big deal in large-scale crypto, so I’m sure certain government labs will love that feature.  But even that’s not all, as they’ve added some nice cluster management features (to allow admins to lock processes to certain GPU’s, a necessary feature in queue-driven clusters) as well as support for 64-bit memory addressing which opens up the 6GB memory available on the Quadro 6000.

In addition, they’ve just announced the new version of Parallel NSight, v1.5, that includes compatibility with Microsoft Visual Studio 2010.  The new version offers a new “Dual GPU” mode that enables the Compute Debugger on a system with 2 suitable GPU’s, previously a feature reserved only for network debugging or the Multi-OS SLI systems.  It adds support for the new Fermi Hardware (GTS460 and such), and all of the features of CUDA3.2.

For those of you in the GPU compute space, however, the big news may be the new ‘TCC’ Driver.  For a while now, Nvidia has offered a special ‘Tesla Compute Cluster’ driver that enables CUDA and GPU support without dragging in the Windows Display Subsystems.  While initially intended to overcome some problems with Window’s strange requirements for hardware access when using Remote Desktop and in cluster systems like HPCServer, the driver loads the Tesla card (or Quadro card, if you really want to) not as a display device, but as an additional compute card installed in the system.  While not intended, Nvidia found some interesting side-effects in how Windows deals with it.  When working with the  Windows Display systems and the WDDM (Windows Display Driver Model), you are required to bundle all of your kernels together before you load them to the card, each kernel taking approximately 30 microseconds.   If you, instead, go through the Windows Driver Model (WDM) then you can load kernels when convenient, and it only takes approximately 2.5 microseconds.  That means a complex situation requiring 10 compute kernels:

  • WDDM: 30 microseconds * 10 kernels = 300 microseconds
  • WDM: 2.5 microseconds * 10 kernels = 25 microseconds.

For people doing very heavy GPU computation, this adds up fast.  However, users found themselves having to make a choice:  Load up the TCC driver and lose all display support, or load up the display driver and deal with the slightly degraded performance.

No more, as the new driver enables a run-time switch that can toggle between Display mode and TCC mode.  Now you can take your dual Quadro system and run in graphics SLI mode for superior performance, then switch one of your Quadros to TCC mode and run your compute codes faster.  Granted, it’s not a situation many people find themselves in but for the few that do: It’s a welcome change.

Parallel NSight will be available next week (at GTC conveniently) on September 22nd.

Full release after the break.

NVIDIA Announces Parallel Nsight Support for Visual Studio 2010 and Up to 300% Performance Increase in CUDA Toolkit Libraries

Parallel Nsight 1.5 Provides Enhanced Parallel Computing Capabilities; CUDA Toolkit Version 3.2 Boosts Performance with New and Improved Math Libraries

Santa Clara, CA – September 14, 2010 – Today NVIDIA extended its leadership in GPU computing with the announcement of new versions of its two industry-leading developer tools:  Parallel Nsight and the CUDA Toolkit.

Parallel Nsight is the only integrated development environment for creating GPU-accelerated applications for a range of desktop and supercomputing platforms.  Parallel Nsight version 1.5 includes support for Microsoft Visual Studio 2010, Tesla Compute Cluster (TCC) debugging, the updated CUDA Toolkit version 3.2, full support for NVIDIA’s recently announced, high-performance Fermi GPU architecture, and other advanced debugging and analysis capabilities.  The new CUDA Toolkit 3.2 release includes two new math libraries, significant performance improvements and support for the new 6GB Tesla and Quadro products.

A short video overview of the new features in CUDA Toolkit 3.2 and Parallel Nsight 1.5 is available at:

http://developer.download.nvidia.com/CUDA/training/CUDAToolkit_and_ParallelNsight_Update_Sept2010.mp4

Parallel Nsight 1.5 Standard edition will be available as a free update on September 22.  In addition, a release candidate of the Professional edition, which includes all Standard edition features plus additional capabilities, including the System Analysis functionality, will also be available.  For more information about Parallel Nsight 1.5, please visit: www.nvidia.com/ParallelNsight.

The CUDA Toolkit includes all the tools, libraries and documentation developers need to build CUDA C/C++ applications, and is the foundation for many other GPU computing language solutions.  In addition to delivering up to 300 percent faster FFT and BLAS performance compared with the previous release, the new CUDA Toolkit 3.2 release includes new libraries for sparse matrix multiplication, random number generation, H.264 encode/decode, and new cluster management features.

For more information on the free CUDA Toolkit please visit: www.nvidia.com/getcuda.

CUDA and Parallel Nsight at GTC

With more than 280 hours of GPU-focused sessions, six sessions on Parallel Nsight and more than 25 sessions on CUDA C/C++ development, NVIDIA’s GPU Technology Conference (GTC) will provide a wealth of information on GPU computing news, developments and achievements.  In addition, experts from NVIDIA and Microsoft will be providing hands-on training and educational sessions on Parallel Nsight, Visual Studio 2010, Windows HPC Server 2008, and CUDA C/C++ development at the Parallel Nsight Lounge by Microsoft (Sept. 20-23, 10 a.m. – 8 p.m.).  For more information about the Lounge and GTC please visit www.nvidia.com/gtc.

Press Contact:
George Millington
NVIDIA Corporation
(408) 562-7226

[email protected]

About NVIDIA
NVIDIA (NASDAQ: NVDA) awakened the world to the power of computer graphics when it invented the GPU in 1999. Since then, it has consistently set new standards in visual computing with breathtaking, interactive graphics available on devices ranging from tablets and portable media players to notebooks and workstations. NVIDIA’s expertise in programmable GPUs has led to breakthroughs in parallel processing which make supercomputing inexpensive and widely accessible. The company holds more than 1,100 U.S. patents, including ones covering designs which are fundamental to modern computing. For more information, see www.nvidia.com.