While at GTC, I managed to slip off the grid for a few hours and head to the other camp, the realm where GPU’s do not reign supreme and companies dedicate themselves to squeezing every clock cycle of performance from their algorithms. In a nondescript office building, I met with Fovia‘s CEO Kenneth Fineman and President & CTO George Buyanovsky for a demonstration of their ‘High Definition Volume Rendering®’ product, and I have to admit I’m impressed.
The product is essentially an SDK or Library for integrating high-speed and high quality volume rendering into other applications, and as such they’ve already got an impressive customer list including biomedical imaging companies like GE and Pfizer, dental imaging companies like 3M and iDent, along with some classic standbys of such technology like NASA and the US Military. Running entirely on the CPU, I was privy to a demonstration of their test bed application running on an 8-core system (with Hyperthreading enabled for 16 logical cores) showing various biomedical datasets on a 1920×1080 display, nearly fullscreen. The visuals were beautiful, and easily operating from 8 to 30 fps depending on number of concurrently running clients and rendering complexity.
Fovia was founded by Ken and George, both former employees from a high-end computer graphics card manufacturer, back in 2003. Unsatisfied with the current “put the graphics in hardware” designs from their employer and the likes of NVidia and ATI, they dedicated themselves to demonstrating that the same work can be done in the CPU, and done faster and better. The result is the HDVR® product they now license. They make use of the most modern instruction sets for high-speed vector computation and parallelism across cores. Once liberated from the restrictive instruction sets of most current GPU designs, they were able to create vastly more complex visualizations using adaptive ray-sampling, adaptive step sizes, and many other optimizations not easily implemented in GPU algorithms.
Read more about Fovia & HDVR® after the break.
During the course of the 90-minute demonstration, I saw several datasets of varying sizes loaded and visualized interactively. The largest was a full 2300 slice CT scan visualized interactively. They state the HDVR product can interactively render up to a 4K cube with sufficient memory, but the 12G of ram in the system would max out around a 1.8K cube, still well beyond most scanners. The transfer functions (of which they can support 8 simultaneously) can easily be adjusted interactively to define colors and opacity, as well as faux lighting models. Their transfer model support made it trivial to visualize both bone and muscle at the same time, with lighting on the well-defined bones and no lighting on the rough muscle.
As the demo went on, they demonstrated a special client-server model they support where the one Windows server connected to two Macbook Air laptops over a simply 802.11N wireless network for full interactivity. (They told me that they support any combination of Window, Linux and Mac – on both the client and server side.) Honestly, you would never know it was done remotely as both systems were just as fully interactive as the local demonstrations. Each of the two laptops loaded a different dataset and interacted with it independently, however they stated that the same system could also be used for collaboration with a single dataset.
Toward the end, they showed some of the other features they’ve integrated, such as support for polygonal geometry rendering. This one could hold special interest for visualization scientists who are currently running into problems with large geometric models coming from the classic isosurface algorithms used on large data. With their raycasting system, the resulting framerate is related to frame-size (and ray detail), not on geometry size. As such, you can load some truly huge models and interact with them with ease. Also, due to their algorithm, you can easily render semitransparent geometry or mix polygonal geometry with volumetric geometry with no performance penalty, unlike most GPU solutions, which require either depth-sorting or depth-peeling algorithms. Currently, most large-scale visualization systems (like VisIt, ParaView, and EnSight) are trying to use polygonal geometry systems like Mesa along with frame compositing to do this work, but raycasting solutions are far simpler to parallelize (each ray can be run individually) and typically result in higher quality visuals anyway. Fovia has a leg up on the competition in this regard, however they do require the entire model be able to fit in RAM on the machine, which is a deal-killer for the extreme viz coming out of systems in use by the DoD and DoE. People working in more normal environments might find this incredibly useful, tho.
The demo is impressive, and the technology is already integrated into various medical imaging machines. It seems a perfect fit, as embedded computers are more prolific and easier to work with than embedded graphics chips, and the performance makes near-immediate visualization of the captured scans possible. However, there are a few points that I feel compelled to make:
- Fovia is very proud (and rightfully so) of the speed and detail of their solution. While GPU solutions can probably match them on speed, those solutions typically lack in detail. It’s easy to load up a dataset as a 3D Texture in video memory and render it in hardware. However, the result is not going to look as good as the Fovia solution. The argument is a combination of Speed and Detail. A GPU could probably win in raw speed, but lose in the resulting detail. GPU solutions are improving, however, and in fact a few people were demonstrating such technology at GTC last week. You can refer to the poster from Harvard’s School of Engineering & Applied Sciences that claim interactive visualization of a 92Gb EM model across 1 to 8 tesla nodes.
- Fovia (correctly) claims that GPU’s cannot hold datasets of this size. I have to give them this one, you’re not going to hold a 4k cube in memory as-is on a GPU (4k cubed results in 68 GigaVoxels). However, where do you get a 4K cube? Most imaging systems work at significantly smaller scales, and Fovia admits that their ‘large’ datasets are either non-medical or from post-processed & stitched scans. Also, more systems are supporting “4D Scans”, which is a time-varying 3D scan. The result is a simple multiplication of the data size, creating small spatial datasets that can still occupy large amounts of disk space and memory. In addition, if you have any ‘multidimensional’ data (such as maintaining a gradient volume or any other derived dataset) you wind up doubling or tripling your dataset size.
- Fovia is big on the “adaptive sampling can’t be done effectively on the GPU”. This one is iffy, at best. No doubt, programming on the GPU is hard, and such adaptive methods are that much harder. However, it has been done before using different techniques. At GTC, a few people discussed such adaptive methods using ray bundling and multipass techniques, but they are still very early in development and the full implications aren’t fully understood.
But even with that, Fovia has a few huge advantages:
- Perhaps the biggest is the cost advantage. Sure, Harvard demonstrated a system that could perform visualization of similar quality and performance on the GPU, but it took 8 Tesla’s to do it. Fovia does it with 2 regular CPU’s.
- GPU clusters are rare in-the-field, with maintenance and upkeep headaches. Airflow problems, hardware failures, and power consumption are all issues when dealing with GPU clusters of any significant size.
- Computational clusters are everywhere, and can be used for a wide variety of purposes. Take that big accounting cluster that’s only used 1 day a week or 1 week a month for billing, and use it for big HDVR visualization the rest of the time.
- Also, while I was unable to test or see this, they claim the system has almost pure linear scaling. Double your CPU core count, double your performance. This works great, but their solution is currently limited to what you can pack into a single box.
And finally the most important from a business aspect: It’s available today. The Harvard system is a great proof-of-concept, but it’s still in the research stages. Several of the other GPU solutions out there are similar, conceptually sound but still “in the lab” and not ready for production. Fovia’s solution is available, and actively in use by several customers in several settings today.
Fovia is a company to keep an eye on. Their CPU-only solution is impressive, and it will be interesting to see how they respond to increased market pressure to improve their system as more GPU-accelerated systems come online. The algorithms and development tools will improve (Parallel NSight will be a huge bonus for that), and GPU memory sizes and performance figures will improve, but of course so will CPUs. The impressive parallelism they can demonstrate right now shows they have an early start on their competition, and I can’t wait to see where they go with it to maintain their edge.
If you want to know more about Fovia:
- Visit their website www.fovia.com
- Watch the Apple’s Fovia Plugin for OsiriX Demo
Whoops. Something got messed up in my post. Apparently you can’t put less than sign in front of a greater than sign without causing some HTML formatting issues.
What my last paragraph SHOULD have said is this…
“The ratio of “less than 4GB” datasets to “greater than 4GB” datasets is, in our business, probably 100:1, and the few that are larger than 4GB are all either multivariate or larger than 4096 in at least one axis.”
The point being, if you look at all the datasets that we need rendered, and noted which ones could not be rendered on a GPU (trivially), those same datasets could also not be rendered with Fovia’s HDVR.
We DO end up having to render those datasets occasionally, and we DO do them on the CPU (inside system memory) but using software that is able to handle the very large size and multivariate/multidimensional nature of them. It’s not interactive (we have a GPU rendered interactive proxy) but they DO render eventually.
My experience, however, has been that framerates are never as important as being able to actually render the data at all. Interactive quality and large viewports don’t matter if the transfer function I need isn’t supported, or the types of data I need to render aren’t supported. As I said before, rendering multiple multivariate datasets with multidimentional transfer functions is trivial with standard OpenGL approaches, provided you can get the data in memory, which you can for clinical datasets.
The ratio of 4GB datasets is, in our business, probably 100:1, and the few that are >4GB are all either multivariate or >4096 in at least one axis.
>Typical clinical CT datasets fit in GPU memory
Indeed, barely it does for high-end GPU however, the major issue is GPU performance to obtain an interactive high quality volume rendering (for mid-size medical CT data 1K…2K 512×512 12bits) and big view ports ~1080p. The memory confinement for GPU is indeed an issue but from my experience it is not very relevant as soon as interactive quality is not matched; what the point to be able to load data in memory and still have way-lower interactive quality then CPU may provide. I would love to run side-by-side Dual X5650 1U vs. newest Tesla-S2070.
Typical clinical CT datasets fit in GPU memory on even modest tablet computers. While it’s possible that industrial CT users are using larger, to project the demand for 4K scan rendering without projecting an increase in GPU memory seems to me to be unreasonable. A fairer comparison is to look at actual features available for current typical datasets. Multiple multivariate datasets rendered using multidimentional classification and advanced shading and lighting is available using standard OpenGL rendering techniques. While you might not be able to load a giant scalar volume onto a GPU (trivially), we are seeing a LOT more multivariate datasets than we are 4K^3 datasets.
>Seems like remote rendering in most cases, especially when limited
>to one box, is not necessary anymore, when you can get a name brand
>workstation with 12, 24, or 48 (in 2011) logical cores (Westmere)
>and 64 GB RAM for ~ $5-7K or less.
Such WS is exactly what hdvr needs to render top-quality VR at interactive rate (it runs fine on 4 years old laptop as well at lower quality); you may render locally or remotely using the WS as server. There is virtually no difference from HDVR’s API point of view to render locally or remotely. I’ve tried to find the GPU’ VR engine to match HDVR but so far nothing even close. HDVR running on Dual X5650 1U blade outperforms dramatically all GPU based VR I’ve tested with dual SLI GTX480 setup (for typical mid size CT dataset); once CT dataset is getting bigger HDVR performance stays pretty much the same; besides its multi-core scalability seems just perfect. I’m looking hard for the best GPU based VR to match HDVR so if you have such please contact me stefanbanev “at” yahoo.com I would glad to compare it with HDVR.
Non-power users in an extreme context cannot be assured GPU rendering either. If you want to render a 4GB dataset on a netbook, or on a handheld device, remote rendering is the way to go. Whether your remote rendering is done on CPU or GPU is up for debate, but it will be a long time before commodity hardware will be able to do volume rendering locally.
Seems like remote rendering in most cases, especially when limited to one box, is not necessary anymore, when you can get a name brand workstation with 12, 24, or 48 (in 2011) logical cores (Westmere) and 64 GB RAM for ~ $5-7K or less.
The Big Iron Intel/AMD servers have more RAM slots, but not that much more processing power ( if any), but can be multiples of the cost.
Then you have users competing for resources ( disk, network, CPU) and have to manage the infrastructure. You also have an expensive piece of equipment to upgrade, whereas you can upgrade users per need with workstations.
Perhaps where you have non power users, remote rendering is still viable, but then relatively standard local GPU rendering would likely suffice.
Doesn’t matter. Fovia doesn’t (or at least didn’t last we looked) support multivariate data either. 🙂
We do both approaches for our work, and the choice depends on the factors you mention. More often then not, we don’t resample, as the data is usually vastly different in resolution, like PET/CT, and end up just doing the more complex rendering (which ends up being just compute time, which is cheaper than user time).
@ Chad That’s a good point, and one that’s frequently brought up in many packages. There’s always a bit of an argument over if it’s worth adding support for multiple datasets, or instead using a multivariate approach where you register the two volumes and resample them onto a common grid.
The multivariate approach is easier, but you do run the risk of losing a bit of detail in the resample step. Multiple datasets don’t have the detail loss, but then you have to deal with the complexities of separate datasets, which could require a significant duplication of processing.
Which do you think is “better”?
I mean spatially, not concurrently. You couldn’t load in multiple MR datasets or compare a CT to ultrasound or compare a current scan to an prior scan. Rendering one dataset in one window and another dataset in another window doesn’t cut it.
Chad>”At the time you also couldn’t render more than one dataset at at time.”
Quite revealing statement… HDVR® is CPU-based engine – there is nothing that precludes from running concurrently any number of HDVR® engines with different data-sets (or the same if required). It has been always the case. For CPU based engine it is very simple and gives a significant competitive advantage vs GPU-based solution. Multiple instances of concurrently running HDVR® is the back-bone of our client-server architecture from day one and is the basis why HDVR® client-server is so flexible and efficient. You may contact [email protected] to acquire an accurate information regarding HDVR® capabilities.
While the cost advantage does exist for the hardware, you are essentially only licensing a rendering library. Considering that it’s ONLY that, the cost for the license means that unless you plan on running this with many users, the cost saving won’t materialize. Not sure how their technology has changed since we last had a demo of Fovia, but there were some really bad limitations. Like you can’t render multivariate data. Only single scalar data can be rendered (and only integer). You also cannot render a dataset that is larger than 4096 in ANY dimension. So you can’t do something 64x64x4097. At the time you also couldn’t render more than one dataset at at time. Finally, we didn’t like how you were locked into their rendering algorithms. We wanted better lighting and shading models, the kinds we weren’t seeing in their system.
I must say, I DO like the client/server system they have. That was really nice, and would make them a top choice if you were looking for a remote rendering solution.