Trifacta LogoTrifacta has positioned itself as the company that can speed up the process of analyzing data by starting right at the source, whether stored in Hadoop, JSON or CSV, for example, through data transformation.

In this exclusive interview and demonstration of Trifacta’s Data Transformation Platform v2, cofounder and CTO Sean Kandel both explains what’s behind Trifacta’s product, who it’s intended users would be, and demonstrates how the visualization tools and machine learning speed up the initial examination and preparation of data for further (and final) analysis.

In 2012 while at Stanford, Sean took part in research comprising 35 interviews of data analysts from 25 organizations across a variety of sectors. What the research found was that the bulk of their time was spent transforming data into a usable form rather than looking for insights. This data transformation process is hindering the movement of data from stores like Hadoop to analytics tools. Trifacta was born to use Predictive Interaction and machine learning to lift the burden of preparing and transforming data.

Advanced Visual Data Profiling
Before data scientists, IT programmers, and business analysts can start to manipulate their data for analysis, they must work through the time-consuming challenge of profiling to ensure the fit and accuracy of data for analysis. To date, this process has largely consisted of manual testing and programming. Trifacta v2 uses a combination of machine learning and interactive data visualization techniques to automatically evaluate the distribution and statistical relevance of data and provide analysts with immediate visibility into unique elements of the data set like data distributions, gaps in data collection, and unusual skew of the data.

Using a unique data scripting language called Wrangle, as the user goes through the Tableau-like visual interface, spotting anomalies, correlating outliers, and verifying what’s useful, the machine learning picks up these patterns and allows these transformations to be applied from small samples throughout your entire cluster. Trifacta therefore become part of the data analyst’s (or business analyst’s) toolkit, with reports of 10X speed savings in getting to the creation of clean and accurate analysis.

Trifacta has partnered with Cloudera, Hortonworks, Tableau and Pivotal, with the goal of clearing “traditional bottlenecks to transforming raw data into actionable data”, either by applying the transformations transparently on data stored in various formats (Cloudera’s Distribution of Apache Hadoop (CDH), or Hortonworks Data Platform (HDP 2) for example) or for the data transformed to be consumed by products like Tableau (using Tableau Data Extracts (TDEs) for analysis and the production of visualizations.

In the interview, the first 5 minutes Sean and I spoke about the background behind Trifacta’s Predictive Interaction and visual Data profiling and it’s purpose, followed by a detailed demonstration of the Data Transformation Platform in its latest release (v2) working with an existing unedited data set, showing the use of Wrangle and machine learning in action.

 

For more information about Trifacta, please visit their website at www.trifacta.com