Click for Fullsize

InsideHPC dug up a nice paper from the ACM PLDI conference that discusses a prototype CPU-GPU Communication optimization tool called ‘CGCM’.  From their conclusions:

CGCM has two parts, a run-time library and an optimizing compiler.  The run-time library’s semantics allow the compiler to manage and optimize CPU-GPU communication without programmer annotations or heroic static analysis. The compiler breaks cyclic communication patterns by transferring data to the GPU early in the program and retrieving it only when necessary. CGCM outperforms inspector-executor systems on 24 programs and enables a whole program geomean speedup of 5.36x over best sequential CPU-only execution.

Impressive results to say the least.  Their examples seem based on CUDA code, not really OpenCL, although they do relate it to tools like OpenMP approaches and CUDA-lite.  It seems it becomes the programmer’s responsibility to mark data that’s to be shared between the GPU and CPU, and then some memory magic happens to move data between the two systems transparently.

It’s available as a PDF.

New Paper: Automatic CPU-GPU Communication Management and Optimization | insideHPC.com.