Google’s newest mobile OS “Android 4.0” has lots of improvements to the UI and guts of the system.  However, one thing many people don’t know that the new OS combined with newer ARM systems enables one additional exciting feature:  GPU computing with the RenderScript API.  Alone that’s impressive, but combined with some of the unique hardware features it could really prove amazing.  Check out the new memory and cache system supported on the new Mali-T604 (Rumored to be the guts of Samsung’s upcoming products):

The ARM Mali-T604 GPU is designed to work with the latest version (4) of the AMBA (Advanced Microcontroller Bus Architecture) which features Cache Coherent Interconnect (CCI). Data shared between processors in the system, a natural occurrence in heterogeneous computing, no longer requires costly (in terms of cycles and energy) synchronization via external memory and explicit cache maintenance operations. All of this is now performed in hardware, and is enabled transparently inside the drivers. In addition to reduced memory traffic, CCI avoids superfluous sharing of data: only data genuinely requested by another master is transferred to it, to the granularity of a cache line. No need to flush a whole buffer or data structure anymore.

These memory flush’es are one of the worst things of modern GPU & GPGPU systems:  One little branch conditional can destroy your performance.  In addition, every time you have to flush your data back to main memory, or load memory into the GPU, that’s a lengthy and performance-killing operation if done often.  These new unified designs have the potential to nullify the impact of these operations, making GPU programming closer to CPU programming than ever before.

via GPU Computing in Android? With ARM Mali-T604 & RenderScript Compute You Can! – ARM Community.