s_spiderAnyone working in Data Analysis and Visualization will tell you that the #1 problem facing them is file storage.  As the datasets get bigger and bigger, moving them from the HPC’s to the Visualization Resources becomes a bigger pain.  Oak Ridge National Labs has been facing this problem for a while now, and has just recently stood up a distributed fileserver named ‘Spider’ to fix this.

Once a project ran an application on Jaguar, it then had to move the data to the Lens visualization platform for analysis. Any problem encountered along the way would necessitate that the cumbersome process be repeated. With Spider connected to both Jaguar and Lens, however, this headache is avoided. “You can think of it as eliminating islands of data. Instead of having to multiply file systems all within the NCCS, one for each of our simulation platforms, we have a single file system that is available anywhere. If you are using extremely large data sets on the order of 200 terabytes, it could save you hours and hours.”

While this is nice, it still doesn’t solve the problem of then maintaining that data in Memory.  But at least you don’t have to spend a month waiting on an FTP to finish anymore.

Update:  I spoke with a source at ORNL, and they corrected a few things:

  • Spider isn’t new, it’s been around for at least a year.
  • It’s 10.7 PetaBytes
  • They don’t use FTP, they use SCP & HSI

So if it’s not new, why the press release?  Not really sure to be honest.  Suspicions are it’s because it was previously in a testing mode, but has just officially entered “production” and general availability.

via HPCwire: Spider Up and Spinning Connections to All Computing Platforms at ORNL.