No surprise that NVidia has big plans on the horizon, but Xbit labs claims to have gotten their hands on some of the Post-Maxwell plans that include Exascale computing designs that pack in 20 teraFLOPS of double-precision match into a single chip, offering over 2 petaFLOPS in a rack.

According to Steve Keckler, the director of architecture research at Nvidia, the Echelon design incorporates a large number (~1024) of stream cores and a smaller (~8) number of latency-optimized CPU-like cores on a single chip, sharing a common memory system. Just like in current architectures, eight stream cores will form a streaming multiprocessor (SM) and 128 of SMs will forum the large pool of throughput-optimized processing elements. Such a chip could deliver 20 teraFLOPS with double precision and a number of them will form a 2.6 petaFLOPS rack. At present Nvidia Fermi (GF110) chip 512 with stream processors operating at 1544MHz can deliver 0.79TFLOPS of DP compute performance. Considerint the 25 times difference in performance, it is highly likely that the Echelon will employ post-Maxwell (~2013 ~ 2014) Nvidia GPU design.

Some of the designs look almost like an April’s Fools joke, with terms like ‘Self Aware OS’ (Skynet anyone?), but this looks like it could be a direct competitor to designs like Intel’s ‘single-chip cloud’.

via Nvidia: Graphics Chips with 20TFLOPS DP Performance Needed for ExaFLOPS Supercomputers – X-bit labs.