AMD has just released the newest version of ACML-GPU (AMD Core Math Library for GPU), which offers several popular math functions (like SGEMM and DGEMM) with GPU optimizations.  However, unlike other options these can automagically switch between the GPU and CPU execution modes depending on problem size and hardware available.

AMD Core Math Library for Graphic Processors (ACML-GPU) provides an ATI Stream-accelerated version of ACML. ACML-GPU accelerates certain routines in ACML, such as SGEMM and DGEMM, by off-loading the computation to the compatible GPUs in the system. The library dynamically decides, based on the parameters passed to the routines, whether to run the computation on the CPU or GPU, depending on which processor will yield the best performance.

Requires the STream SDK, but works in both PGI Fortran, Visual Studio C, and GCC in Windows and Linux.

via AMD Core Math Library for Graphic Processors (ACML-GPU) | AMD Developer Central.