Virgile Adam of the Katholieke Universiteit Leuven in Belgium [and collbaorators] describe the ultimate in holographic (three-dimensional) data storage: a chemically pure crystal composed solely of proteins that can be read and reversibly switched between at least two different states using nothing but light.
Embedded within the proper array of lasers (it would take at least two), such a crystal would represent something approaching the theoretical limit of data density in a storage medium: each bit would be represented by a single molecule
… at least one of these proteins, known as IrisFP, actually has the ability to store data in four different states, versus the two different states (on and off) encoded by a traditional bit. In other words, this protein could store data in base 4 instead of base 2.
As disk sizes grow, so does our risk of losing massive amounts of data in hard drive failures or theft. A new infographic on YouTube shows some of the risks with some truly scary statistics:
- Your Hard Drives have a 1 in 10 chance of failing this year
- Human Error & Faulty Media are the 2 leading reasons of Data Loss
- Only 1 in 20 companies that suffer a serious data loss will remain in business
- Average time to be “up and running” After a restore is 4 hours
See the full video for all the details.
The new digital formats and stereoscopic tools used in Avatar generated unprecedented quantities of data that had to be stored and transferred not only between computers but between entire studios. Isilon provided a clustered IQ filer setup and describes it in an article with The Register.
An Isilon release says: “The Avatar production generated terabytes of data in various formats, including massive digital files used in creating Avatar’s all-digital, virtual filming environment, small metadata and instructional files, still frames for review, and large media files from Avid systems.” The terabytes of data were created on a weekly basis and, sometimes, a daily one.
But Isilon wasn’t the only provider of storage equipment, as NetApp was involved as well.
Weta used NetApp kit to store the incoming data, then used a huge number of workstations and bladed servers – with 30,000 cores in total – to work on it. The NetApp filers were fitted with up to five 160GB DRAM cache accelerator cards in their controllers, the PAM (Performance Acceleration Modules) caches, to speed file access by the Weta creative people and the servers.
HotHardware got its hands on the new Fusion-io ioXtreme PCI Express SSD solutions, and ran them through a nice benchmark suite. The results are still being processed, but they had this to say already:
It’s true, your eyes do not deceive you. You’re looking at a product that offers 300MB/sec average write throughput and 750 – 800MB/sec of average read throughput. We’re going to strap on a drool bib and get back to testing these bad boys a bit more. RAID 0 anyone? Stay tuned!
Last month (August 14th), we published a feature story from Paul Adams about the ioXTreme card from FusionIO. From his article:
The new 80 GB ioXtreme from Fusion-io was recently displayed at SIGGRAPH 2009 in New Orleans, LA. The ioXtreme is a solid-state drive (SSD) that fits into a x4 PCI Express slot. The beauty of the using the PCI Express slot is that you can really obtain great performance. Fusion-io is claiming that their drive can achieve a write bandwidth of 500 MB/s and a read bandwidth of 280 MB/s.
While those number match the performance data given to us in Phone Interviews and the DataSheet they handed out at SIGGRAPH (We still have it, & verified it), FusionIO contacted me today requesting a correction.
- Read Bandwidth 697 MB/s (64 KB packets)
- Write Bandwidth 288 MB/s (64 KB packets)
The changes have been integrated into Paul’s original article, go check it out for details on the fastest SSD solution on the market.
One common problem you hear about alot from other SSD systems is the problem of “silent failures”. As the cells die in the SSD chips, eventually you lose the ability to write data to them. Even with the best parity & error checking systems, it’s possible that groups of cells can fail at a time and result in bad reads. FusionIO has addressed this problem on several fronts:
- They’ve implemented a RAID controller into the chip so that reads & writes are dispersed among several chips, improving bandwidth and reducing the number of writes required to a single chip
- 11-bit Parity for error checking
- “Real Only” checking. In the event that a portion of the memory dies, that section becomes “Read Only”, allowing safe reads of your data back for backup.
It’s an impressive card, but not cheap. Currently selling for approximately $30/Gig , it’s not for the faint of heart. SSD prices are falling fast, and showing up in more and more systems so things like the FusionIO will become more prevalent (and cost-conscious) in the near future.
It’s been a year since the Data Intensive Cyber Environments (DICE) group moved from UC San Diego to UNC Chapel Hill, and in that time they’ve worked closed with the Renaissance Computing Institute (RENCI) so build a network of data repositories across the state called the “Data Grid”.
When completed, the Data Grid in action might work like this: Data on development patterns around the North Carolina would be stored at RENCI at UNC Charlotte, where researchers at the RENCI engagement center study urban growth patterns and their implications. An urban planner in eastern North Carolina would be able to access that data as well as the software tools that allow it to be viewed in a visual, intuitive format. Those same researchers also would be able to access coastal floodplain maps and storm surge visualizations stored at other data hubs and to use all of the information to plan sustainable coastal developments.
A great solution for visualization as a decision-making tool, if they can deploy it across a wide enough audience.
Mozy, popular online backup service, has created an interesting infographic visualizing various statistics related to hard drive storage. Combining numbers about the falling price of data storage, how much you can fit in various data size amounts, and the increasing amount of computer usage, it’s a huge graphic that contains several little tidbits of information.
Anyone working in Data Analysis and Visualization will tell you that the #1 problem facing them is file storage. As the datasets get bigger and bigger, moving them from the HPC’s to the Visualization Resources becomes a bigger pain. Oak Ridge National Labs has been facing this problem for a while now, and has just recently stood up a distributed fileserver named ‘Spider’ to fix this.
Once a project ran an application on Jaguar, it then had to move the data to the Lens visualization platform for analysis. Any problem encountered along the way would necessitate that the cumbersome process be repeated. With Spider connected to both Jaguar and Lens, however, this headache is avoided. “You can think of it as eliminating islands of data. Instead of having to multiply file systems all within the NCCS, one for each of our simulation platforms, we have a single file system that is available anywhere. If you are using extremely large data sets on the order of 200 terabytes, it could save you hours and hours.”
While this is nice, it still doesn’t solve the problem of then maintaining that data in Memory. But at least you don’t have to spend a month waiting on an FTP to finish anymore.
Update: I spoke with a source at ORNL, and they corrected a few things:
- Spider isn’t new, it’s been around for at least a year.
- It’s 10.7 PetaBytes
- They don’t use FTP, they use SCP & HSI
So if it’s not new, why the press release? Not really sure to be honest. Suspicions are it’s because it was previously in a testing mode, but has just officially entered “production” and general availability.