Maverick at TACC tackles big-scale data visualization (Interview – Part 1)
TACC (The Texas Advanced Computer Center) at the University of Texas at Austin, has just deployed Maverick, a unique, powerful, high performance visualization and data analytics resource for the open science and engineering community.
I spoke with Kelly Gaither, Ph.D., the principal investigator on the project and TACC’s director of Visualization, at the TACC Visualization Laboratory (Vislab) on campus, just a few weeks after the deployment, for an overview on Maverick and other features of TACC and the Vislab.
Kelly described Maverick as is “a next generation remote and interactive visualization and data analysis resource”. Maverick was created in collaboration with NVIDIA and HP (see the specifications below), and deployed on March 3, 2014. She summarized the Maverick environment as, “having supercomputing capability, but with a laptop interface.” For the user, she says “it feels like they’re at their laptop or desktop, but it’s got the power of a supercomputing center at the back end.”
I asked who the user base would be, both locally and remotely, and she quickly described the primary local users as members of ICES (Institute for Computational Engineering and Sciences at UT at Austin), as well as others at UT who are creating biological research and studies in the humanities.
Typically, remote users are from the National open science community, as defined by the NSF. Maverick is part of the National Science Foundation’s XSEDE (Extreme Science and Engineering Discovery Environment) project, able to tackle large scale cosmology, turbulent flow simulations, and global weather forecasting. TACC also provides collaborative capability, allowing interactivity between multiple researchers.
Good data visualization comes from more than powerful technology
In the interview, Dr. Gaither elaborated that TACC offers more than just the GPU supercomputing capabilities: good data visualization calls for skills over and above knowledge of statistics and one’s own field of inquiry. She focused on “bridging the gap between the information that’s trapped in that data set and the way we portray it. Visualization is evolving: it starts from the roots in scientific visualization. Shortly after that came what we would call information visualization, which really deals with the abstract data, financial data, for example, and [it's] now evolving into the era of visual analytics; where the scientist can sit down and have a combination of data analysis tools with the visualization and visual design tools to make an effective visualization to really interact with the data, barrier-free.”
She continued, “Most of what we try to do is remove the barriers for those scientists. Sometimes it’s the interface tools, sometimes it’s just the language the application speaks, but then sometimes it’s really just giving them some creative hints how to communicate the data to themselves, their collaborators, and then to their funding agencies.”
I asked her about the data that’s used for Maverick, and then about data management to feed these visualizations:
“With respect to who owns the data, we take the tact that the scientists owns the data, we don’t own that data. Sometimes it’s simulated…sometimes it’s measured. We often times (almost always) work collaboratively with them to get imagery they approve of before we show it to anyone else.”
Kelly continued by describing Stockyard, a 20 petabyte shared file system. All of the resources are located and shared from Stockyard, which allows the data to be put in one place, avoiding creating multiple copies of data sets, and unifying the management of resources.
The interview contains more about TACC’s work with the scientific community, and a question about Dr. Gaither’s “passion projects.” You can view the interview above or on YouTube.
Part II of the interview will contain a short demonstration of Stallion, the huge, multi-screen high-res display seen behind Dr. Gaither and me in the interview at the Vislab.
Maverick System Specifications
- Comprised of five racks containing 132 HP ProLiant SL250s Gen 8 compute nodes and 14 HP ProLiant management, login, and Lustre router servers.
- Each of the 132 compute nodes will include two ten-core Intel Xeon E5-2680 V2 processors with 256GB of DDR3 1866MHz memory each, a Mellanox Connect-X3 FDR InfiniBand FlexibleLOM adaptor, and one NVIDIA® Tesla K40 GPU accelerator.
- A Mellanox FDR InfiniBand interconnect will provide a high-performance communication platform.
Prior to Maverick’s deployment, TACC released the following comments:
“Increasingly the limiting factor in HPC systems performance is not just floating point performance but data movement,” said Scott Misage, HPC engineering director at HP. “The new Maverick system is based on HP ProLiant compute nodes with NVIDIA Tesla K40 GPUs that offer double the GPU memory of previous GPUs, minimizing the need for data movement and enabling new levels of large scale scientific data analysis.”
“Maverick is part of a growing trend in which researchers and technologists are expanding the use of accelerated supercomputers for big data analytics, in addition to traditional science,” said Sumit Gupta, general manager of Tesla Accelerated Computing products at NVIDIA. “Combining HP ProLiant servers with Tesla K40 GPUs, the world’s highest performance accelerators, will give TACC users the fastest computational horsepower for big data analytics with industry-leading visualization capabilities.”
In addition, Maverick will offer cloud-like visualization capability and analytics through the ongoing evolution of TACC’s remote visualization software. “We’ll continue to offer our remote visualization suite and services leveraging traditional OpenGL-based visualization applications (VisIt, ParaView), commercial third-party applications (EnSight, Amira), and TACC-developed applications (DisplayCluster, GLuRay),” Gaither said.