Google has released a new open-source tool called ‘Google Refine’ that aims to make cleaning up messy datasets a breeze.  Their description is a bit sparse:

Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.

Essentially it’s a desktop software accessible via a web-browser, which means you don’t have to upload your data to google to benefit from this,  that uses lots of ‘intelligent’ algorithms to fix common problems with data like bad field alignment, format inconsistencies, and mangled input.  It’s really meant for more database-style inputs (row/column style), but could be great for cleaning up user-survey inputs or large downloaded datasets. The main focus of the software is to first execute a filter to get just the part of the data you want, then apply a single operation to the group. That operation can be anything from ‘delete’ (to remove offending rows from a cut-n-paste’d Wiki entry) to reformat (to convert lines into tables). Watch the videos after the break for more details.

via google-refine – Project Hosting on Google Code.