AMD has published a “whitepaper” (it irks me that they call these Whitepapers when they’re actually powerpoint presentations) discussing optimizations for Image Convolution algorithms on both CPU and GPU. They start with an algorithm and add some optimizations for the memory overlap, and then naively port it to a Radeon 5870 to run in 1511 ms. Then, with some careful optimizations, work it down to a mere 182ms!
Randall Hand
Randall Hand is a computer graphics programmer and news junky that's been working in the field for the last 15 years. He's responsible for visualizations generated on some of the most powerful supercomputers in the world, ytnef, mullion support in ParaView, and VizWorld.com.