A new project on Google Code called ‘Ocelot’ aims to compile CUDA programs for execution on NVidia GPUs and x86 CPU’s.
Ocelot is a dynamic compilation framework for heterogeneous systems, accomplishing this by providing various backend targets for CUDA programs. Ocelot currently allows CUDA programs to be executed on NVIDIA GPUs and x86-CPUs at full speed without recompilation.
The project is freely available and BSD-licensed, and aims to integrate with Harmony as a means for programming on heterogeneous multicore architectures (FPGA, Cell or Larrabee).
via gpuocelot – Project Hosting on Google Code.
Jan Vlietinck has published a simple 200x200x200 fluid simulation that simulates and renders the result using the new DirectX11 DirectCompute GPU acceleration systems.
The calculations make use of a well known scheme of velocity advection, Jaccobi pressure solving and making the velocity divergence free by subtracting the gradient of the pressure.This is the so called Semi-Lagrangian scheme. A more accurate solver makes use of the second order MacCormack technique. The simulation makes use of the latter. However it makes the simulation unstable and introduces artifacts. Limiting generated extremes can fix this, unfortunately I was not able to get this working, so the simulation runs without limiters, still the result is some visual interesting turbulent behavior.
The amplitude of the speed vectors are visualized. To make a 3d rendering, a simple ray maximum projection is used. This shoots rays through the volume searching the maximum speed along the ray. With a linear interpolation the speed is given some color.
via Fast software renderer.
Do you remember the story we posted, ORNL Looks to NVidia GT300 for next Super, back on September 30th? Well, according to SemiAccurate.com, the project was killed since Fermi consumes too much power. However, Legit Reviews contacted several people at NVidia and ORNL who all say that the rumor is false.
Read the original rumor at: Oak Ridge cans Nvidia based Fermi supercomputer
Read the rebuttal at: False Rumor – Oak Ridge Cancels NVIDIA Fermi Based Supercomputer!
Adobe has roled out a new playback engine in their new Premiere Pro CS5 product called “Mercury” that offers astounding performance by pushing most of the work onto the GPU. Adobe’s video guru Mr Dennis Radeke explains:
In the post, Dennis went on to explain “What is the Mercury Playback engine about? In a word, performance! It makes Premiere Pro do cartwheels and flips and barely breaks a sweat. It's like rocket fuel for your car. It's flat out incredible…” while we might say that this statement might be over-enthusiastic, read on: “In my first test of Mercury, I dropped several P2 clips on a timeline, made them picture-in-picture and looked to see if there were any dropped frames during playback…nada. I added more clips, bringing it up to eight or nine on my HP XW9400 with 12 cores of AMD goodness… Think it's the CPU? No! It's only being used at about 20-30%. It's GPU! I keep going and there is no hesitation in Premiere Pro. Okay, lets add some color correction to each one and while we're at it, lets drop in some blurs [that will stop it right?] Still playin' like buttah!”
What’s particularly interesting is that the technology that they are using is exclusive to NVidia, using CUDA technology. Of course it may not remain that way, but with AMD having difficulty with their OpenCL driver, CUDA is probably the best option available right now. As BSN suggests:
Thus, it isn’t surprising to see Adobe going to CUDA first. The plan is probably equal to all plans that we heard so far: go to CUDA in order to completely unlock the GPU potential and only then port to OpenCL, as Apple’s and AMD’s OpenCL toolkits mature, sometime in 2011.
via Adobe’s Mercury Playback Engine for CS5 is CUDA-only! – Bright Side Of News*.
A free webinar from NVidia and GoToMeeting will feature the new “Fermi” architecture and it’s capabilities for GPU computing, along with the previously mentioned “Mad Science Promotion”.
NVIDIA’s next generation CUDA architecture, code named “Fermi” is the most advanced GPU computing architecture ever built. Join us for a live webinar to learn about the new Tesla GPU Compute solutions built on Fermi and the dramatic performance capabilities they offer customers who are tackling the most difficult, compute-intensive problems. In addition you will learn about our limited time offer, the Mad Science Promotion, whereby you may qualify for a promotional upgrade to a new NVIDIA Fermi-based Tesla product when you purchase a NVIDIA® Tesla™ C1060 GPU Computing Processor or a S1070 1U GPU Computing System today.
The webinar is next Wednesday (December 16th) at 10AM PST. Registration is on their website.
via GoToWebinar : Webinars & Web Events Made Easy. Award-Winning Web Casting & Online Seminar Hosting Software.
In a promotion sure to make Wimpy proud, Nvidia is offering a Fermi tomorrow for a Tesla today. Buy any of their current high-end Tesla cards, and get a free upgrade to the new Fermi-based equivalent when it’s released.
When you purchase a Tesla C1060 GPU Computing Processor through this promotional offer, you will qualify for a no penalty upgrade to a Tesla C2050 or a Tesla C2070 GPU Computing Processor. Start experiencing GPU computing today on a Tesla C1060 and be assured to be one of the first to receive the new Fermi-based Tesla C2050/C2070 GPU Computing Processor.
Warm up those wallets folks, they’re gonna go fast!
via Mad Science Promo.
SiSoftware has just recently announced the availability of it’s first OpenCL GPGPU Benchmarks as part of Sandra 2010, and now AMD has announced that they’ll be supporting it as a global OpenCL Benchmark package. Such common benchmark suites are popular in the HPC arena as a way to benchmark common performance characteristics in suites like Linpack. Some of the features in Sandra2010 include:
- 4 architectures natively supported (x86, x64/AMD64/EM64T, IA64/Itanium2, ARM)
- 6 languages supported (English, French3, German3, Italian3, Japanese3, Russian3)
- AMD OpenCL 1.01
- nVidia OpenCL 1.0
- GPU + CPU parallel execution supported, up to 8 devices in total.
- Different models of GPUs supported, including integrated GPU + dedicated GPUs.
- Multi-GPUs supported, up to 8 in parallel.
The package is available in several versions with varying feature sets, ranging from a free “Lite” Version to a full “Enterprise” edition.
SiSoftware Zone Sandra2010 Announcement, via AMD Press Release
A new C++ library called HPMC sponsored by the Research Council of Norway enables isosurface extraction of volumetric data directly on the GPU using histogram pyramids and vertex shaders.
The library analyzes a lattice of scalar values describing a scalar field that is either stored in a Texture3D or can be accessed through an application-provided snippet of shader code. The output is a sequence of vertex positions and normals that form a triangulation of the iso-surface. HPMC provides traversal code to be included in an application vertex shader, which allows direct extraction in the vertex shader. Using the OpenGL transform feedback mechanism, the triangulation can be stored directly into a buffer object.
Requires CMake, GLEW, and a good video card, but runs on wIndows, Linux, and Mac.
via Marching Cubes using Histogram Pyramids. (Details of the Algorithm, PDF)
Accelereyes has release Jacket GBENCH 1.0 for benchmarking GPU performance on your system across a wide variety of scientific algorithms, including LU Decomposition, FFT’s, BLAS, 3D Convolutions, and more.
GBENCH is a practical application benchmark measured in real seconds and is not meant to be a scientific or theoretical benchmark measured in GFLOPs. Also note that for fairness, arithmetic precisions (e.g. double, single) have been matched on the CPU and GPU. Finally, the data sizes used in these computations are large enough to exploit data parallelism (e.g. no scalar arithmetic was attempted). This benchmark assumes a data parallel problem.
It requires the MATLAB Compiler Runtime, but runs on both Windows and LInux.
via AccelerEyes – Jacket GBENCH – For GPU System Benchmarking.
CAPS Entreprise was at sc09 announcing the latest version of their HMPP compiler toolsuite. If you are not familiar with their product, it is a toolsuite to aid in hybrid GPU/CPU software development. They were nice enough to invite us for a talk and give us a demonstration of their product.
At the heart of HMPP, it is an extension to existing compiler to help with GPU compute code. With the wide variety of compilers (Microsoft, PGI, GNU, Intel), IDE’s (Visual Studio, Eclipse, Emacs) and GPU Languages (CUDA, OpenCL, Streams, Brook) it can be daunting to develop applications that efficiently works across the wide variety of environments.. HMPP aims to make this all much simpler by allowing you to insert simple directives into your code indicating data structures and routines to accelerate.