Stories from March 27th, 2012

NVIDIA Helps Power Bid for 2015 Moon Mission — and $30 Million in Prizes

A group of 100 scientists, engineers, and developers are working together for a bid at Google’s Lunar X Prize, a $30 million award to the first private funded team to land a rover on the moon.  Any bid will require tons of work in computations, hardware, and physics, but the German team is beefing up their systems with the power of NVidia Tesla GPU’s.

The PTS team will benefit from the Tesla GPUs at all stages of the mission. During preparation and planning, GPUs will be used to simulate millions of different mission scenarios. This will enable the team to improve launch and landing techniques by, for example, adjusting the timing and duration of thruster burns for course corrections, while minimizing the margin of error.

Once Asimov has reached its destination, the PTS team will use the computational power of Tesla GPUs to navigate and monitor the rover’s activities and generate highly detailed lunar maps from the transmitted stereoscopic 3D images.

via NVIDIA Helps Power Bid for 2015 Moon Mission — and $30 Million in Prizes – NVIDIA Newsroom.

Hardware, Science , , ,

 
Stories from November 2nd, 2011

ARM Mali-T604 and GPU Computing in Android?

Google’s newest mobile OS “Android 4.0″ has lots of improvements to the UI and guts of the system.  However, one thing many people don’t know that the new OS combined with newer ARM systems enables one additional exciting feature:  GPU computing with the RenderScript API.  Alone that’s impressive, but combined with some of the unique hardware features it could really prove amazing.  Check out the new memory and cache system supported on the new Mali-T604 (Rumored to be the guts of Samsung’s upcoming products):

The ARM Mali-T604 GPU is designed to work with the latest version (4) of the AMBA (Advanced Microcontroller Bus Architecture) which features Cache Coherent Interconnect (CCI). Data shared between processors in the system, a natural occurrence in heterogeneous computing, no longer requires costly (in terms of cycles and energy) synchronization via external memory and explicit cache maintenance operations. All of this is now performed in hardware, and is enabled transparently inside the drivers. In addition to reduced memory traffic, CCI avoids superfluous sharing of data: only data genuinely requested by another master is transferred to it, to the granularity of a cache line. No need to flush a whole buffer or data structure anymore.

These memory flush’es are one of the worst things of modern GPU & GPGPU systems:  One little branch conditional can destroy your performance.  In addition, every time you have to flush your data back to main memory, or load memory into the GPU, that’s a lengthy and performance-killing operation if done often.  These new unified designs have the potential to nullify the impact of these operations, making GPU programming closer to CPU programming than ever before.

via GPU Computing in Android? With ARM Mali-T604 & RenderScript Compute You Can! – ARM Community.

Hardware , ,

 
Stories from October 24th, 2011

Workshop on General Purpose Processing on Graphics Processing Units

The fifth annual GPGPU Workshop will be held in London this March alongside ASPLOS 17, and is already accepting submissions of papers and abstractions.

The goal of this workshop is to provide a forum to discuss new and emerging general-purpose purpose programming environments and platforms, as well as evaluate applications that have been able to harness the horsepower provided by these platforms. This year’s work is particularly interested on new heterogeneous GPU platforms.

The program committee has all the usual players (NVidia, AMD, Microsoft) and good university showings (MIT, Iowa State, Imperial College, etc), along with a few surprises like JP Morgan Chase.  The potential of GPU’s for large-scale statistical modeling (like in Finance & Stockmarkets) is of big interest for large financial houses, and an area we can expect to see significant financial influence to weigh in.

via Workshop on General Purpose Processing on Graphics Processing Units.

Science ,

 
Stories from September 12th, 2011

octanerender running on a VDACTr8 with 8 GTX 580 GPUs

A promotional video from RenderStream, providers of multi-GPU systems for rendering and science, does a great job of also promoting OctaneRender on multiple GPU’s.

octanerender demonstration running on a RenderStream VDACTr8 with 8 GTX 580 GPUs. In this video we demonstrate the rapid visual feedback one can expect when using an 8 GPU system in a detailed interior scene. We also show how well octanerender perfroms when scaling from 1-8 GPUs.

via octanerender running on a RenderStream VDACTr8 with 8 GTX 580 GPUs – YouTube.

Graphics , ,

 
Stories from September 7th, 2011

AMD FirePro certified for Abacus finite element analysis

AMD is pushing into the GPU-compute space hard with systems like Fusion, and has now managed to get their FirePro discrete card certified for OpenCL acceleration of the Abaqus Finite Element solver.

“Many of the tasks that used to take a full day to complete can now be done in about half that time with GPU compute, saving engineering time and resources during product research and design, and reducing overall time to market,” said Sandeep Gupte, general manager, AMD Professional Graphics. “With SIMULIA’s latest realistic simulation software, which is compliant with OpenCL standards, engineers can achieve precise results in their design analysis with minimal hardware limitations.”

via AMD FirePro certified for OpenCL-compliant Abacus finite element analysis (FEA) software | FireUser Blog.

Science , , ,

 
Stories from August 26th, 2011

Introduction to GPGPU Development using CUDA

Credit goes to InsideHPC for digging up a lecture by Rob Gillen at the recent CodeStock 2011 event on GPGPU Development.

Video: Introduction to GPGPU Development using CUDA | insideHPC.com.

Science , ,

 
Stories from July 25th, 2011

Automatic CPU-GPU Communication Management and Optimization

Click for Fullsize

InsideHPC dug up a nice paper from the ACM PLDI conference that discusses a prototype CPU-GPU Communication optimization tool called ‘CGCM’.  From their conclusions:

CGCM has two parts, a run-time library and an optimizing compiler.  The run-time library’s semantics allow the compiler to manage and optimize CPU-GPU communication without programmer annotations or heroic static analysis. The compiler breaks cyclic communication patterns by transferring data to the GPU early in the program and retrieving it only when necessary. CGCM outperforms inspector-executor systems on 24 programs and enables a whole program geomean speedup of 5.36x over best sequential CPU-only execution.

Impressive results to say the least.  Their examples seem based on CUDA code, not really OpenCL, although they do relate it to tools like OpenMP approaches and CUDA-lite.  It seems it becomes the programmer’s responsibility to mark data that’s to be shared between the GPU and CPU, and then some memory magic happens to move data between the two systems transparently.

It’s available as a PDF.

New Paper: Automatic CPU-GPU Communication Management and Optimization | insideHPC.com.

Science , ,

 
Stories from June 15th, 2011

Microsoft Going All-in on GPU Computing

In a press event yesterday, AMD announced the next generation Fusion chips they are working on, and a tiny little note on a one slide mentioned a new tool from Microsoft called AMP.  Over on the NVidia blog, they give a few more details about it:  It’s a new GPU Programming tool from Microsoft.

Its intent with C++ AMP is to expose C++ language capabilities to millions of Windows developers with the goal of enabling them to take advantage of GPUs. It promises to give millions of C++ developers the option of using Microsoft Visual Studio-based development tools to accelerate applications using the parallel processing power of GPUs. CUDA C and CUDA C++ will continue to be the preferred platform for Linux apps or demanding HPC (high performance computing) applications that need to maximize performance.

via Microsoft Going All-in on GPU Computing « NVIDIA.

Science ,

 
Stories from June 8th, 2011

More Offerings Optimized for OpenCL™ Standard

AMD has a new press release out touting more OpenCL offerings, but includes a nice list of OpenCL applications.  It’s not as extensive as NVidia’s CUDA lists, but has some big names like ArcSoft, Corel, Sony Vegas Pro, and Rovi.

“Today’s creative professional needs a complete solution that delivers clear, crisp and stutter-free visuals that will allow them to edit, process and create content quickly and without interruption,” said Dave Chaimson, vice president of global marketing, Sony Creative Software. “New support has been added to Vegas Pro 10.0d for accelerated OpenCL based video rendering. We see this as a solid first step towards a faster production workflow for video professionals, and we are strongly committed to the OpenCL standard.”

Also of interest is a rather impressive list of Engineering software using OpenCL for simulation acceration.  Dassault, Altair, and ESI are all in the list, along with a few others.

If you want to know more, they’ve got a conference (The AMD Fusion Developer Summit) coming up next week in Bellevue, Washington where they’ll be demonstrating them.

via AMD and Leading Software Vendors Continue to Expand Offerings Optimized for OpenCL™ Standard.

Science , ,

 
Stories from May 30th, 2011

BitCoin: An Experiment in GPGPU

So, for the last week or so the internet has been abuzz with stories about “BitCoin”, the new all-digital currency that’s going to destabilize governments around the world and bring us to a new utopian society.  Well, yeah it’s a lot of hype.  But when I heard about the “mining” aspect of it, and how it’s almost entirely GPU based, I figured I would check it out.

From what I can tell, the “mining” part is really just a brute-force hash attack, looking for specific numbers.  I ran some experiments with this once before as part of my Quadro5000 review using HashGPU.  Using the OpenCL bitcoin miner, I figured I could some up with some nice results.  I had a machine handy with two GeForce GTX285′s in it, and easily managed to eek out about 64Mh/s on each card, for a total around 128Mh/s. (Mh/s = Million Hashes per Second).  I also happen to have the Quadro 5000 card around, based on the Fermi Architecture, so I thought I’ld throw it in as it’s not currently in the Wiki Hardware Results.

I was very disappointed to find that my Quadro5000 could only manage about  58-59Mh/s, a startling 10% less than the GTX285.  This truly baffled me.  The Quadro 5000, which handily beats AMD cards in most benchmarks, falls waaay behind the ATI offerings which easily rake in 100+Mh/s, some hitting 300Mh/s.

All in all, I ran with two GTX285′s for about 4 days, and mined all of 2 BitCoins.  Presumably with a single AMD Radeon 6990, for $700 which claims to rake in over 650Mh/s , I could have make 5x that.  It’s interesting to see that as popular as CUDA is, there are still several problems where AMD’s “stream” design beats NVidia’s hands-down.

And of course, this wouldn’t be a BitCoin article if I didn’t include “If you liked this article, feel free to send some BitCoins to 1HXHDYeQux5BVzwTF5gEwMS2MgaKXwXeft“.

P.S. If anyone actually does send me any, shoot me an email with the amount for a Shout-Out here.

Update 7pm: Wow.. I just refreshed my wallet and thanks to the 3 people who actually sent me a total of 0.12 BTC, the equivalent of about $1 at current exchange rates.

Hardware, Science , , , , ,

VizWorld.com is a production of VizWorld, LLC © 2009