TACC’s Kelly Gaither gave a nice presentation in the Dell booth at SC on the trials and tribulations of performing data analysis and visualization “At scale”. In her context, “at scale” means on large HPC-scale datasets.
Visualization is one of the most important and commonly used methods of analyzing and interpreting digital assets. For many types of computational research, it is the only viable means of extracting information and developing understanding from data. However, non-visual data analysis techniques—statistical analysis, data mining, data reduction, etc.—also play integral roles in many areas of knowledge discovery.
TACC is using technology that I’ve begun deploying at my employer combining dedicated visualization resources with large-shared filesystems (eliminating file transfers) and client-server tools. Her talk focuses on their software (Longhorn Portal) & hardware (Longhorn & Stallion) deployments, unfortunately lacking much detail on Impact of the system beyond fuzzy “works great” remarks. It’s a good talk if you’re unfamiliar with the problems of interactive visualization at the tera/petascale, and Kelly is always fun to listen to.
TACC is bringing a new Supercomputing online called “Stampede” which aims to push the limits of Linux Cluster designs and further the NSF’s “Extreme Digital” program. Providing a peak 2 Petaflops of performance, with another 8 petaflops of possible performance thanks to Intel’s Knights Ferry chips, you may be wondering what they plan to use for visualization and analysis.
In addition, Stampede will offer 128 next-generation Nvidia graphics processing units (GPUs) for remote visualization, 16 Dell servers with 1 terabyte of shared memory and 2 GPUs each for large data analysis, and a high-performance Lustre file system for data-intensive computing.
Many people in the field (myself included) believe that the days of dedicated graphics processing systems are numbered, but Stampede seems to indicate they have some life left in them. However, I have to wonder what percentage of the $50M cost of the system is locked up in these 16 1-terabyte nodes.
The recent Spring 2011 issue of “HPCsource”, a supplement of Scientific Computing, focuses entirely on Visualization including articles on GPGPU programming, Remote 3D Viz, interviews with experts like Kelly Gaither of TACC, and much more. The online digital version includes videos of several parts, but the whole magazine is available in a downloadable PDF.
The issue is sponsored by Dell, so you’ll have to put up with several full-page Dell ads for hardware and services, but the articles look really interesting.
Kelly Gaither is the Director of Data Information & Analysis at the University of Texas, Texas Advanced Computing Center (TACC), and was present in the Dell Booth at SC10 last year touting the impact of visualization in HPC. HPCatDell has an interview with her on YouTube.
“In the visualization, we are exploiting what our brain does every day,” she says, adding that the technology and people who harness it are assisting in curing cancer, understanding how aircraft fly and helping scientists predict hurricanes. One collaboration project, the Longhorn Project, has deployed the largest, remote, interactive visualization cluster in the world.
She was in the Dell Booth as TACC runs some rather large Dell clusters. I’ve known her for several years and heard her speak many times at several events, and she really is a driving force in the HPC visualization space, both for her impact on dealing with large data and parallel systems, and for her development of large “superdisplays” comprised of large tiled displays.
The National Archives has the mission of cataloging millions of records ranging from the important (presidential speeches and decrees) to the mundane (internet tweets) every year, and with the explosive growth of digital media they’ve found themselves at a bit of a quandary. First off, how do you store the massive amounts of data we generate every day? Then, how do you find anything inside the giant mountain of data. The Texas Advanced Computing Center (TACC) partnered with the NSF to create some new digital archival and visualization technology shown in an article on the TACC website.
“Archival analysis is a multi-layered process and it is unique to each collection that is being assessed,” explained Maria Esteva, a digital archivist and data management and collections researcher at TACC. “We are conducting research to map analysis processes used by archivists onto a visualization that combines data driven analysis tools. In this way, the archivist can integrate his or her experience into the workflow.”
The first step in the project was to represent a large and heterogeneous archival collection.
“We are all familiar with desktop icons, representing folders and files,” Esteva said. “But imagine a screen clogged with millions of such icons, with little clue as to what is inside. It takes a visual representation to show millions of files at a time.”
TACC will be demonstrating some neat visualization technology at SC10 this week, ranging from their EnVision product they’ve shown at the last few conferences, but also a nice 3×3 tiled display and some 3D visuals.
Between these talks, TACC staff will demonstrate leading-edge technologies developed by researchers at the center, such as the EnVision remote visualization software package, methods for ray tracing, stereoscopic molecular visualization, portals for plant biologists, and data-centric cyberinfrastructure.
The booth will also feature stunning visualizations and digital art on Colt (a 3×3 tiled display powered by a Dell graphics cluster that enables very large data sets at high resolution), and 3D animations and demonstrations on Mustang, TACC’s 82″ stereoscopic display.
Over at the Texas Advanced Computing Center (TACC), they’ve got a new machine online named ‘Longhorn’ that boasts all of the usual numbers in HPC: 210TB local file system, 13.5TB of Memory, and 2048 compute cores. (256 nodes). What makes it a bit different is that it also has 2 GPU’s on each node, offering up 512GPUs of visualization and GPU-computing power, making it 597 Teraflops.
“Longhorn is an impressive machine,” Fogal said. “Using 256 GPUs, we volume rendered data larger than two terabytes, which is among the largest-ever published volume renderings. Such renderings would normally require hundreds of thousands of CPU cores otherwise. Our success was in major part due to the large number of GPUs available, in addition to TACC’s helpful staff that aided us in accessing them for our visualization work.”
But that’s not all. In addition to having the horsepower, it’s got the sexy UI to bring it to the masses. Available to users is the ‘Longhorn Visualization Portal‘, a web-based VNC client that connects with their queueing system to provide direct visual access to a complete desktop, enabling you to run any application you want: CUDA or Visual.
An article on the University of Texas website talks about the TACC Vislab, home of several large cutting-edge displays. While I’m sure they mean it to talk about how the large displays help to advance scientific research and understanding, it gets off to a rocky start by pitching it as a tour stop.
Though primarily intended for scientific research, the Vislab has become a dramatic attraction for students, artists, humanities researchers and university officials, who frequently showcase the lab on VIP tours. From analysis of brain scans to student film festivals, the lab is now a hub for novel research and presentations across the sciences and humanities.
However, if you keep reading you’ll eventually get to some of the research this large screen aids in.
Bajaj and co-workers have developed a computational method that transforms incredibly high-resolution microscopy scans of a mouse brain — a part of the hippocampus believed to be important in memory — into a wiring diagram of the type used by electrical engineers.
Because the microscopy technology used to explore the hippocampus creates images of incredible size and resolution, the Stallion tiled display in the Vislab was the only place where Bajaj and his group could see the multi-Gigapixel images at full nanometer-resolution scale.
Scientific visualization tools are unnecessarily complicated to use. This complexity increases the time required to gain insight into a given data set, and thus inhibits casual use. The difficulty arises from the need to support data from a large variety of sources and the need to support a wide variety of visualization algorithms. Though the number of data file formats is unbounded, the format of any given data set can be described using a small set of parameters. Further, the set of visualization algorithms applicable to a given type (e.g. dimensionality) of data is small and the number of these algorithms commonly used in a specific scientific domain is even smaller. These two insights have led the Texas Advanced Computing Center (TACC) to the development of a new tool for scientific visualization. This tool dramatically simplifies data importation and visualization algorithm selection through user-directed semi-automation. The strategy is consistent with a larger trend in data analysis and visualization towards ease of use.
This tool, called EnVision, aims to achieve an interface similar to Google Maps, making the visualization process easy and helping to make scientific visualization a more common activity for researchers.
EnVision is a tool to remotely visualize dataset through a web browser. It allows you to transparently user remote visualization resources through a thin web based client from anywhere in the world.
Two days ago, NASA released a Hubble Space Telescope of Messier 81, also known as Bode’s Galaxy. The image has a resolution of 22,620 x 15,200, which is over 343 million pixels. I noted, as a point of comparison, that the Texas Advanced Computing Center has a tiled display that is 307 million pixels. Dr. Bill Barth, the director of HPC at TACC, commented on the article and placed links to Hubble images being shown on their tiled display. Below you can see in the first picture, an image of Messier 81 being shown on that display. Now that is a beautiful sight, even with the center monitor being nearly all white.
The second image shows the Carina Nebula, where new stars are being born. The Carina Nebula, also known as NGC 3372, is an Emission Nebula in the Milky Way. This nebula is four times larger than the Orion nebula. However, it is found in the night skies of the southern hemisphere. The image was taken by the Hubble space telescope, and combined with data taken from ground based observations. The image is slightly larger at 29,566 X 14,321, or about 423 million pixels.
Click on the images below to see a larger version in all its glory. We would like to thank Dr. Bill Barth and TACC for these wonderful images.
TACC’s tiled display consists of a 15×5 array of Dell 30-inch widescreen LCD monitors. For more information, your can visit their website at TACC Visualization Resources.
Comments