Google Code is hosting a project from the University of Virginia that claims to be the fastest ever sorting algorithm, taking advantage of GPU’s.

This project implements a very fast, efficient radix sorting method for CUDA-capable devices. For sorting large sequences of fixed-length keys (and values), we believe our sorting primitive to be the fastest available for any fully-programmable microarchitecture: our stock NVIDIA GTX480 sorting results exceed the 1G keys/sec average sorting rate (i.e., one billion 32-bit keys sorted per second).

In addition, one of our design goals for this project is flexibility. We’ve designed our implementation to adapt itself and perform well on all generations and configurations of programmable NVIDIA GPUs, and for a wide variety of input types.

They have some great detail on the website, but it looks like their algorithms requires that the entire sorting deck fit into GPU memory as they specify a maximum input deck size of 272M, which at 4-bytes per integer that puts it right over 1G..

via RadixSorting – back40computing – High performance GPU radix sorting in CUDA – Project Hosting on Google Code.