Stories from February 2nd, 2011

Textual analysis in parallel with ParaText

The brains behind one of my favorite visualization tools ParaView, the guys at Sandia National Labs, have turned their sites on new prey: Textual Analysis.   Their new tool “ParaText” can process massive collections of text in parallel across massive supercomputers, churning through massive 500-million work collections in under a day.  (War and Peace is only 560,000 words).

ParaText distributes a different subset of documents to each processor, which in turn analyses that subset. And because of their efforts to minimize communication and make ParaText scalable, the result is a tool that could be run in a variety of environments, including on a grid or cloud. It can be embedded in any application using a native C++ API, Python, or Java. A standalone ParaText MPI executable can be run via command line. Or ParaText can be deployed as a web service using a RESTful API.

ParaText is based on the existing Titan Informatics toolkit, created by Sandia and Kitware

via Textual analysis in parallel | iSGTW.

Science , , ,

VizWorld.com is a production of VizWorld, LLC © 2009