NextIO – PCIe Expandability, Virtualization, & Hot Swap

How many times have you looked at your motherboard in despair at the meager 2 or 3 PCI express slots? Maybe you’re one of the lucky people with the new Tyan board and have 6 or 8? As more and more power-hungry and io-hungry devices come to market, the PCIe bus is used more and more. Devices Like PCIe SSD disks, high speed networking interconnects, and (of course) GPUs are all fighting fir those precious slots. And, of course, Murphy’s Law dictates that just as your cluster is fully assembled and online, something will need to be upgraded or replaced.

NextIO has an intriguing solution for this common problem, but it’s not one many people have thought of. With their existing N1400-PCM, you can expand your PCIe bus out to an impressive 14 slots, and 24 with their newest product (N2800-ICA). But it doesn’t end there.

The NextIO product feature can be summed up in a few short words: extensibility, virtualization, and hot swap. These aren’t words people typically associate with PCIe devices, but the NextIO solution makes them all possible.

Extensibility

The first and most obvious feature of the NextIO product is its ability to extend the PCI bus. On the show floor they has put 9 (of a possible 14) RAM-SANs from Texas Memory Systems into their 3u chassis, creating an SSD drive array previously impossible. This alone is impressive, but remember that this is an external unit, so this could be connected to a 1u Server Blade creating an impossibly fast database server. But ssd isn’t the only solution, any PCIe device can be installed, so get yourself a 1u blade with Teslas in the box, or perhaps insane network bandwidth by filling the box up with Infiniband or gigE cards.

But extensibility is more than just adding extra resources. GPU’s and Network cards tend to be less reliable that other computer hardware. NIC ports blow out, GPU’s overheat, and both are prone to failure requiring you to physically replace them. Rather than having to remove the node from the rack for maintenance and shut it down, you can leave the node running in a “reduced functionality” state (eg, Minus the PCI device) while you replace it in the NextIO chassis. In fact, with certain devices like Network cards, you may never even notice that it’s missing while you reallocate existing resources to be “shared” across the node. This is possible with the next feature, Virtualization.

Update 11/30/09: Read our followup about how much bandwidth you can expect, with a statement from NextIO, here.

Virtualization

The N2800

Virtualization of system images is all the rage right now, and the NextIO solution really takes it to the next level. in their new second generation product. NextIO helped design the specification for “Multi Root IOV”, or MRIOV, standard that allows PCI device virtualization (You can read the spec at the PCI-SIG Here). You can install any MRIOV compliant device and virtualize it among any number of connected servers. GPUs are, unfortunately, not compliant but you can mix-and-match. Imagine this scenario:

You have a small cluster, eight 1u units, that you use for your visualization work. your problem would really benefit from GPU acceleration (either for compute or rendering), but you can’t put in a decent card in a 1u space. Also, all of the network cables and cards really make a mess. You could buy a single second-gen NextIO unit which consumes 3U in your rack, and drop in 8 Graphic cards (Quadro, Tesla, Radeon, whatever) and one or two NICs. Cable up each o the 8 nodes up to this 1 box, and then use their provided management software to assign a GPU to each node, and virtualize the 2 NIC’s across the 8 nodes, each NIC serving 4 nodes.

In a scenario like this you’ve breathed new visualization life into a previously compute-only cluster by adding in 8 graphics cards. That’s not all, as then you could dynamically switch the configuration to allow certain nodes access to multiple GPU’s if needed. That’s possible through the next major feature: Hot Swap.

Hot Swap

Hot-swap support for PCI devices has always existed, but was never really supported by hardware or operating systems. In fact, if you power up your computer right now and then rip out a PCI device, it’ll act just as tho you ripped out a USB stick: A little notification that the device was removed, and all is well (provided you didn’t need it for some critical function). Putting the device back in is tricky in both electrical and implementation. You don’t want anything to arc, causing permanent damage to the card, and most operating-systems right now don’t even plan for that to happen, causing ungraceful crashes (BSoD for you windows users).

The second-gen NextIO Chassis puts each card in an electrically isolated removable “chassis” that, once disabled via their management console, can easily be slid out the back of the rack for maintenance, repair, or upgrade. Once you slide it back in, it automatically appears back in the console and you can assign it to the desired node(s). Of course, it requires some OS support beyond this point, so on Windows you’ll probably have to reboot in order to take full advantage of the device. However, on Linux they were demonstrating in their booth live hot-swap of an NVidia Tesla card. The demo went like this:

Run “lspci” on the node, show the Tesla card in the list.
Run a quick CUDA demo (a Monte-carlo simulation) to show the results.
Fire up the management console, & shut-down the card.
Physically remove the card from the NextIO chassis.
Run “lspci” on the node again, the card is no longer listed
Run the same CUDA demo again, and it fails because no CUDA compliant device is available.
Replace the Tesla in the Chassis.
Re-enable it in the management console
Run “lspci” on the node a 3rd time, the card is listed again.
Run the CUDA demo, and it works again.

Four 1U Nodes connected to the N2800-ICA

Very impressive, and for large HPC visualization clusters, a godsend. The management console can be controlled via the command line, and NextIO claims to have worked with various other queue-control providers (specifically MOAB) so that your job submissions can request various hardware resources and have then dynamically allocated to you as needed. Imagine augmenting your giant Cray or SGI supercomputer (like the recently crowned #1 Jaguar at ORNL) with a few of these boxes, and then allowing users to submit visualization jobs that request 1024 processors and 8 GPU’s, and having it all work.

The technology is impressive, and NextIO is definately a player to watch in the near future. I know I’ll be keeping an eye on them. If you want more information, you can contact NextIO via: