Stories from January 16th, 2012

glu3D GPU Edition for 3dsMax

There’s a new version of the glu3D fluid motion simulator for 3dsMax that brings it up to date with 3dsMax2012 and adds in CUDA computation for improved performance.

It speeds up intensive computations that glu3D performs to calculate particle fluid dynamics. With this glu3D edition it is possible to simulate faster and with more particles than before; it is possible to simulate your scene more times; it is almost interactive. Animated fluid scenes with more quality and level of detai

You can download their free Demo version at their site, or buy it for $760.

via glu3D GPU Edition for 3dsMax – NEW!.

Graphics , ,

 
Stories from August 26th, 2011

Introduction to GPGPU Development using CUDA

Credit goes to InsideHPC for digging up a lecture by Rob Gillen at the recent CodeStock 2011 event on GPGPU Development.

Video: Introduction to GPGPU Development using CUDA | insideHPC.com.

Science , ,

 
Stories from July 25th, 2011

Automatic CPU-GPU Communication Management and Optimization

Click for Fullsize

InsideHPC dug up a nice paper from the ACM PLDI conference that discusses a prototype CPU-GPU Communication optimization tool called ‘CGCM’.  From their conclusions:

CGCM has two parts, a run-time library and an optimizing compiler.  The run-time library’s semantics allow the compiler to manage and optimize CPU-GPU communication without programmer annotations or heroic static analysis. The compiler breaks cyclic communication patterns by transferring data to the GPU early in the program and retrieving it only when necessary. CGCM outperforms inspector-executor systems on 24 programs and enables a whole program geomean speedup of 5.36x over best sequential CPU-only execution.

Impressive results to say the least.  Their examples seem based on CUDA code, not really OpenCL, although they do relate it to tools like OpenMP approaches and CUDA-lite.  It seems it becomes the programmer’s responsibility to mark data that’s to be shared between the GPU and CPU, and then some memory magic happens to move data between the two systems transparently.

It’s available as a PDF.

New Paper: Automatic CPU-GPU Communication Management and Optimization | insideHPC.com.

Science , ,

 
Stories from July 5th, 2011

NVIDIA Names Stanford University a CUDA Center of Excellence

NVidia has just announced the newest entrant into the ever-growing CUDA Center of Excellence program, Stanford University.  Stanford already has a CUDA architecture and parallel computing program, so adding them to the Center of Excellence program is really a mere formality.

“It’s vitally important that our faculty be at the forefront of computing technology so that we can continue developing state-of-the-art computational algorithms that drive innovation in the sciences and engineering,” said Margot Gerritsen, director, Institute for Computational & Mathematical Engineering, and associate professor, Department of Energy Resources Engineering, at Stanford University.  “This award allows us to broadly expand parallel computing education and research programs to large numbers of researchers and students from a wide variety of disciplines.”

The grants provided by NVidia as part of the program will be used to support some new research programs on mesh-based solvers for partial differential equations and probability and uncertainty quantification work.

Get the full release after the break.

Science , ,

 
Stories from May 30th, 2011

BitCoin: An Experiment in GPGPU

So, for the last week or so the internet has been abuzz with stories about “BitCoin”, the new all-digital currency that’s going to destabilize governments around the world and bring us to a new utopian society.  Well, yeah it’s a lot of hype.  But when I heard about the “mining” aspect of it, and how it’s almost entirely GPU based, I figured I would check it out.

From what I can tell, the “mining” part is really just a brute-force hash attack, looking for specific numbers.  I ran some experiments with this once before as part of my Quadro5000 review using HashGPU.  Using the OpenCL bitcoin miner, I figured I could some up with some nice results.  I had a machine handy with two GeForce GTX285′s in it, and easily managed to eek out about 64Mh/s on each card, for a total around 128Mh/s. (Mh/s = Million Hashes per Second).  I also happen to have the Quadro 5000 card around, based on the Fermi Architecture, so I thought I’ld throw it in as it’s not currently in the Wiki Hardware Results.

I was very disappointed to find that my Quadro5000 could only manage about  58-59Mh/s, a startling 10% less than the GTX285.  This truly baffled me.  The Quadro 5000, which handily beats AMD cards in most benchmarks, falls waaay behind the ATI offerings which easily rake in 100+Mh/s, some hitting 300Mh/s.

All in all, I ran with two GTX285′s for about 4 days, and mined all of 2 BitCoins.  Presumably with a single AMD Radeon 6990, for $700 which claims to rake in over 650Mh/s , I could have make 5x that.  It’s interesting to see that as popular as CUDA is, there are still several problems where AMD’s “stream” design beats NVidia’s hands-down.

And of course, this wouldn’t be a BitCoin article if I didn’t include “If you liked this article, feel free to send some BitCoins to 1HXHDYeQux5BVzwTF5gEwMS2MgaKXwXeft“.

P.S. If anyone actually does send me any, shoot me an email with the amount for a Shout-Out here.

Update 7pm: Wow.. I just refreshed my wallet and thanks to the 3 people who actually sent me a total of 0.12 BTC, the equivalent of about $1 at current exchange rates.

Hardware, Science , , , , ,

 
Stories from April 5th, 2011

Keeneland Workshop on CUDA & GPU Development

The Georgia Tech NVIDIA CUDA Center of Excellence is preparing a nice 2-day long tutorial on GPU programming and heterogeneous computing, including both CUDA and OpenCL.  The event will only cost you a $100 registration fee and the cost of your room and time, making it one of the best ways to get into GPU programming.

Hit the website for all the details and links.

Keeneland Workshop | Keeneland.

Science , , ,

 
Stories from February 28th, 2011

NVidia Announces CUDA 4.0

Big news from Nvidia today as they announce the latest version of their GPGPU toolkit, CUDA 4.0.  This new version has all the usual performance enhancements and bugfixes, but also comes with 3 new features that I can guarantee all you CUDA developers are going to love.

  • GPUDirect2

    GPUDirect2.0 – The previous version of GPUDirect worked with clusters based on Mellanox Infiniband backbones, but this new version works with multiple GPU cards in a single machine.  Where previously the CPU was involved in memory transfers between cards, now you can DMA transfer directly between cards using MPI-style Send & Receive commands.

  • Unified Virtual Addressing – Now, CPUs and GPUs all show up in a single uniform address space.  This makes moving memory between them much easier.
  • Integrated Thrust Support – A great C++ library similar to LABLAS and CULAPACK, it adds in standard template constructs for all the popular data types and algorithms.  Thrust has a great following and active community, and boasts run-time selection of CPU vs GPU code, making the resulting code a bit more portable than previous CUDA.

All of this goes to further reinforce NVidia’s commitment to not just building nice graphics cards, but to continue to build and support a developer community around the computational capabilities of their hardware.  With renewed support in their Tegra line and the new ARM cores on the horizon, NVidia knows that having a wide community of developers ready to go on the new hardware is critical to mainstream market success.  Microsoft is already pumping up support for a future ARM-based Windows, and ARM already has wide support in many embedded applications like settop boxes, smartphones, and tablets.  Tools like Unified Virtual Addressing and GPUDirect2 further Nvidia’s attempts to tear down the barriers between the CPU and GPU, making future porting to ARM systems simpler.

Get the full details of the release in the Press Release after the break.

Read more…

Hardware, Science , ,

 
Stories from February 2nd, 2011

DNeg Accelerating Visual Effects with NVIDIA Quadro and CUDA

Another win for NVidia in the VFX and CUDA space comes from Double Negative, who found their proprietary fluid simulation system “Squirt” getting a nice 20x performance boost.

“Moving our fluid solver onto the GPU allows our artists to get the results of their simulations back much faster, without any impact to their workflow,” explained Dan Bailey, lead GPU developer, Double Negative. “By default, fluid simulations are now sent to a specialized GPU farm, affording the artists more time to iterate and ramp up the complexity of a shot to achieve a more believable result for the big screen.”

It took 6 months of hard work to complete the transition, and the 20x number they’re seeing isn’t even using a Quadro4000 (Fermi-based).  Just wait until they upgrade to see the numbers then.

via DNeg Accelerating Visual Effects with NVIDIA Quadro and CUDA « NVIDIA.

Graphics, Hardware , , ,

 
Stories from January 27th, 2011

Parallel Nsight 1.51 Pro Now Available Free

Big news for Visual Studio developers using CUDA, NVidia has just announced that the newest version of PArallel NSight 1.51 Professional Edition is now available for free for all!  Coming with all of the great professional tools, this is huge news and a mandatory download for anyone doing NVidia GPU development on Windows.

Parallel Nsight Professional Edition is now available for all Visual Studio 2008 and 2010 developers, free of charge. NVIDIA is now offering the full Parallel Nsight Pro feature set at no cost, as we historically have done with CUDA and our other development tools, so that a broader range of developers can take advantage of the full benefits of this popular parallel computing development tool.  Parallel Nsight support will continue via the Parallel Nsight forums, and Professional developers are encouraged to sign up for NVIDIA’s Registered Developer Program, which provides priority access to new software, bug management tools, invitations to members-only developer webinars and other development resources.

In addition to making it a free download, they’ve removed all of the license key and activation stuff, so it really is a ‘no strings attached’ release.

via Parallel Nsight Download | Parallel Nsight.

Science , , ,

 
Stories from January 25th, 2011

Sorenson Squeeze 7 goes GPU for 7x Speed Boost

The newest version of Sorenson Media’s Sorenson Squeeze 7, an application for video encoding to a wide variety of formats, adds support for NVidia GPU’s and CUDA acceleration.   The end result is a 3x boost in rendering speed, taking those hour-long jobs down to a mere 20minutes.

By utilizing GPUs such as NVIDIA Quadro® professional graphics solutions, the specialized microprocessors that power graphics in professional workstations, Sorenson Squeeze 7 delivers significantly faster encoding times. Sorenson Squeeze 7 automatically recognizes when the user’s primary CPU or GPU may be faster and will use the better resource for the encoding job. The software is optimized for NVIDIA CUDA, the parallel computing architecture created by NVIDIA that powers a variety of their popular GPUs. Internal benchmark tests have shown Sorenson Squeeze 7 is up to three times faster than Sorenson Squeeze 6 when encoding in the H.264 format using GPU acceleration.

Now, that’s only in H.264 for the 3x boost, which as much as Google complains about it is really still the de-facto standard for Web Video.  The new version also adds adaptive bitrate support and several new formats like BluRay and WebM, and is available for $799. for new users. Full details after the break.

Read more…

Science , , , ,

VizWorld.com is a production of VizWorld, LLC © 2009