I had the privilege of attending a talk by Luxology where they detailed some of their internal research on CPU and GPU rendering technologies, finding GPU rendering approaches surprisingly disappointing in comparison. He makes a great comparison between CPUs and GPUs, stating that GPUs are very “Wide” but computationally “Shallow”, in contrast to CPUs which are very narrow but computationally deep.
They’ve released the presentation as a Quicktime video showing their results. I highly recommend you check it out and see their results. I have some issues with their findings, but it’s a great comparison that shows that GPUs are not a panacea for all problems. As he says, “It’s clear that the GPU is not the magic bullet they had hoped”, and then moves towards heterogeneous solutions merging the best of both worlds. Later he shows some “pure research” work with networked rendering using the BOXX renderPro system.
The picture on the right side is definitly no octane render.
Octane beta is missing a filter which would prevent fireflies ( very bright pixels ) …and there is not one in this comparison.
GPU is good for SIMD friendly algorithms and it is apparently not suitable for algorithms if code path depends on local data it renders; the adaptive volume rendering algorithms are another examples where GPU is inferior…
@ Whoever
You cannot compare the term core on CPUs and GPU because the organization of them is deeply different.
A CPU core can be programmed to his task completely indepent no matter which memory is accessed from which core and which subroutine whatever core will go.
On the GPU a lot of cores are organized in something called a WARP. All cores within that warp need to run that same piece of code, and should not branch different than the others if certain conditions are met. Otherwise the efficiency of that WARP unit will slow down to a single core.
To summarize, GPUs excel when simply doing arithmetic stuff and are running into trouble when your code has to check a lot of conditions and call other code.
Raytracing itself is a highly parallel problem, but to have fast hitpoint computations you need to traverse your data structure where you have to check a lot of conditions. The same is true when you are evaluating material behaviour. As soon as you have complex and realistic material behaviour there are quite a lof of conditions to check … that is the reason that there is NO real production raytracer for professionals that earn money with it out there.
I am sure not everybody likes those facts but at the end of the day everybody needs to use tools that has to fit to his problem describition.
I also want to share that this week I will be posting an article on GPUs vs CPUs in relation to what I saw at SIGGRAPH. I’m currently waiting on some industry people to share quotes, which should come in early this week, then I can post it.
I invite all of you to return and we can continue the discussion there!
@Whatever
Modern GPU has about 512 cores, while most modern CPU’s have 4 cores (and some come with 8). So if anyone is at a disadvantage here, it’s the CPU. 512*2=1024 > 4*12=48. There is also the matter of making 12 CPU’s working together, while with the GPU you only have 2, and it’s already been taken care of by the drivers.
Now by reading the rest I can clearly see that you have a very wrong perception of how things work on the GPU side, but let me just say, those shaders were removed because there is no way to implement them on current GPU’s.
Oh, and @Randall, in computer sciense “algorithms for GPU” have been around for about 30 years. It was just named differently, parallel and distributed computing.
And one another thing, software algorithms won’t improve (not by a noticeable margin anyway). What might improve is the hardware, or algorithms which operate the hardware, things like caching, branch prediction or OOE.
Your comment about algorithms is partly true. Algorithms in the “distributed computing” and “parallel algorithms” space are a start, but typically do not take into account the various restrictions on GPU-like systems (reduced instruction sets, limited branch support, low memory per thread/process). The closest similarity to what you speak would be old “Vector Processor” algorithms from old cray-like systems. However, these algorithms have been pretty much ignored by the larger CS community as x86 took the overwhelming majority and vector processors fell by the wayside.
You only have to look at conference proceedings from things like IEEE VisWeek and SIGGRAPH’s technical tracks to see that great advances are still being made in the GPU algorithm space. Not in revamping old algorithms, but in developing entirely new algorithms that better fit the architecture. Things like kernel-based Hough Line Transforms can extend tranditional algorithms into GPU space to enable real-time processing of massive images thanks to the parallel nature of the algorithm and hardware.
Not all algorithms can easily be adapted in such a way, and in those cases new algorithms are emerging. Things like sorting are a classic gpu-unfriendly algorithm, however new algorithms like GPUSort (great analysis here) prove that the work can be done on the GPU in new and vastly faster ways.
Combining these new improved algorithms with upcoming hardware advances (like your mentioned caching & branch prediction) will only push GPU performance even further forward.
At the end of the day it all comes back what is a solution to a specific problem. While GPUs have made tremendous progress within the last years they are not a solution (YET) to quite a lot of problems. Luxology has software that delivers solutions for specific problems NOW and they are confident that it is not possible with todays GPUs. In todays time with quite a lot of marketing behind RAYTRACING ON GPU IS THE HOLY GRAIL I find they deserve quite some respect for having their own opinion. AFAIK the maxwell guys are telling the same story.
GPGPU is completely different architecture that excels in same areas and sucks in others. Programming tools sucks quite a lot (common wisdosm says you need approx 5times as much time to get an algorithm fast on a GPU), the GPU drivers also and there is limited memory on these devices. Consumer GPUs have max 2GB these days if you want to stay cheap and get more expensive than highend CPUs if you need 6GB. If your problem needs more memory GPGPU computing is no longer an opinion because it doesn’t work anymore.
Everything might change when the next next genereations arives but today there are more limitations. Sometimes I wonder why people get so excited without their own experiences …
BTW: The caustic 2 card is not happening anymore … ever thought how expensive it is to roll out own asics if your solution will not sell in ultra high volumes ? I think Caustics business model failed, they tried to get bought from Intel/NVIDIA/ATI and it simply did not happen … so now they are trying to have some kind of standard middleware, I wonder how they want to make money with that ?
I just lost my respect to Luxology. Is this test sponsored by *ntel or what ?
There’s also a cost issue. Getting a workstation capable of holding 4 GPU’s is cheaper than getting a workstation capable of holding 4 CPU’s. And a X7560 costs three times as much as a C1060. Using an external PCIe chassis, it spreads even further apart.
It doesn’t help that they’re comparing raytracing, either, which is of course favoring the CPU. So when is the CausticTwo coming out?
The point about GPU algorithms being young and perhaps less than fully optimized is a good one. Over time I think the GPU will really come into its own.
The # of cores argument is not as simple as 12 vs 2. Really it was 2 CPU’s vs 2 GPU’s. If you want to talk cores then it was 12 vs gosh, a couple of hundred on the Quadro.
A key point not covered in this test was the other issue of the CPU being superior at handling large scenes that do not fit into GPU memory.
@ Whatever I agree 100%. The biggest problem with GPU algorithms today is that they are young. The CPU has had 30+ years of development, while the GPU has maybe 5. Algorithms will improve, things will get better.
Problem in the testing.
2 GPUs vs. 12 CPUs. I do not think I have heard the GPU marketers say a GPU is 10x faster than a dozen CPUs, but their “testing” clearly shows that a GPU is as fast as 6 CPUs (2 vs 12).
So a 6x speed gain, without doing the same “tweaking” they have done in their engine, cause let’s be honest, they surely did not go an optimize the shaders they sent to the GPUs as they have for the CPU engine they have built over the last seven+ years.
Some of the shaders that are “missing” are also questionable. Because the GPU does not natively do those calculations does not mean they could not have written their own. Its still a Programmable Unit, and things that are not in the box can be written for it.
Kind of a weak test IMHO.