The latest issue of ‘nature|methods” journal contains give special articles all about visualization of biological data.
A series of five commissioned Reviews discuss the challenges of visualizing biological data and the visualization tools available to biologists working with genomes, alignments and phylogenies, macromolecular structures, images and systems biology data.
The five articles are:
- Forward: Supplement on Visualizing Biological Data, by Daniel Evanko
- Commentary: Visualizing biological data – now and in the future
- Review: Visualizing Genomes: Techniques and Challenges
- Review: Visualization of multiple alignments, phylogenies, and gene family evolution
- Review: Visualization of image data from cells to organisms
- Review: Visualization of macromolecular structures
- Review: Visualization of omics data for systems biology
Far too many names to include here, but all articles are available in Abstract, Full Text, and downloadable PDF. Go check `em out.
via Table of contents : Nature Methods.
In the newest issue of BMC Bioinformatics 2009 (10:452), Josiah Seaman and John Sanford publish details for their 2-D genome visualizer called ‘Skittle’.
Results: This program first creates a 2-dimensional nucleotide display by assigning four colors to the four nucleotides, and then text-wraps to a user adjustable width. This nucleotide display is accompanied by a “repeat map”which comprehensively displays all local repeating units, based upon analysis of all possible local alignments.
Skittle includes a smooth-zooming interface which allows the user to analyze genomic patterns at any scale.Skittle is especially useful in identifying and analyzing tandem repeats, including repeats not normally detectable by other methods. However, Skittle is also more generally useful for analysis of any genomic data, allowing users to correlate published annotations and observable visual patterns, and allowing for sequence and construct quality control.
Skittle is freely available from their Sourceforge page, and they have a great document with example visualizations in this PDF.
via Skittle: A 2-Dimensional Genome Visualization Tool – 7thSpace Interactive.
While sequencing the genomes of complex organisms, like humans, remains a costly and time-consuming process, simpler organisms, like bacteria, can be sequenced relatively trivially. A new paper featured in the upcoming issue of Nature covers the results of 200 dispered organisms and shows a fascinating genomic graph of the results.
A new paper takes an approach that’s less driven by self-interest. Its authors surveyed hundreds of strains of bacteria and archaea that we know how to culture, and picked 200 of them that are broadly dispersed across the tree of life, based on the sequence of a ribosomal RNA gene. They’re now in the process of completing the genomes of all of them, and the paper serves as an interim report.
via Presenting a genomic encyclopedia of bacteria (and archaea).
Biologists out there might want to take a look at the open-source MizBee visualization tool.
MizBee is a multiscale synteny browser for exploring conservation relationships in comparative genomics data. Using side-by-side linked views, MizBee enables efficient data browsing across a range of scales, from the genome to the gene. The design of MizBee is grounded in perceptual principles, and includes several techniques such as edge bundling and layering to enhance visual cues about conservation relationships related to proximity, size, similarity, and orientation.
It uses a dual-ring visualization where the outer ring is source chromosomes, and the inner ring is destination chromosomes. It’s difficult to explain, so just head on over to their site and watch the quicktime demonstration video.
Human genetics research is lengthy and slow, and one primary cause of that is the slow combinatorial algorithms used. Researchers at Dartmouth have just published a paper where they implemented these algorithms on NVidia graphics cards with CUDA, with amazing results.
One such algorithm is Multifactor Dimensionality Reduction (MDR). Expert knowledge guided evolutionary computing wrappers around MDR have previously been shown to be a powerful way to efficiently analyze datasets for interactions. Evolutionary computing can effectively address some of the challenges these datasets present. Unfortunately examining the statistical significance of results requires permutation testing, which increases the computation requirements by a factor of 1000. Here we implement an expert knowledge guided ant system on graphics processing units (GPUs) and show that the GPU implementation makes the rigorous statistical analysis of large datasets practical.
The paper was recently published in GECCO’09, and can be viewed below.
10.pdf (application/pdf Object).
In an effort to locate a genetic basis for schizophrenia, the National Center for Genome Resources (NCGR) in Santa Fe, New MExico established the Schizophrenia Genome Project. Taking genetic data from 14 patients and 6 controls, they found themselves searching for 11,500 candidate genes amongst 16.7 billion bases. How to find them? Statistical analysis and visualization.
NCGR analysts used principal components analysis and hierarchical clustering to assess the data. The variance attributable to disease status was higher for the Illumina digital expression data than from conventional array analysis. “Visualization tools, such as Principal Component Analysis, readily separated the cases and controls, we spotted differences right away,” says Schilkey.
via Bio-IT World.
A new website, Diseasome, visualizes 516 diseases and 903 genes to show common backgrounds and effects. The result is an interactive map showing how groups of diseases, say Cancer, share a common background and genetic material but manifest in slightly different ways. There is also a wealth of information about how they did it:
Nodes are positioned on the map according to a topological placement algorithm, i.e. each node is positioned solely according to its linking pattern. Many softwares are available for doing this. Gephi has been chosen for its high quality algorithm ForceAtlas.
Many algorithms make possible for a 2D rendering of an adjacent matrix – i.e. the matrix describing any graph. We used a ForceAtlas algorithm, which shares with all the others the same basic principle: minimizing the system’s energy while maximizing the use of the space available for the representation of the data. To minimize the system’s energy, one can for instance assume that nodes that are not linked to each other are pushing away from each other whereas nodes that are linked to each other are attracting each other. Through iterative steps the algorithm tries to find a way to position nodes where there is as little link overlap as possible. To maximize the use of the mapped space, the graph is spread as much as possible over the surface allocated for its display.
You can view the map at their site, http://diseasome.eu/, as well as buy a Poster or the Book.
David Cox of North Carolina State University has created a new visualization method for DNA sequences that he’s calling “symbolic scatter plots”.
His technique starts out similar to Blast, he says, in that it takes the sequence at hand and breaks it up into small words. Whereas Blast computationally plugs those words into a database to find similar matches, his method simply maps the words. In his case those words are 3-mers that correspond to one of 64 possible choices because there are 64 possible combinations of three nucleotides. Each 3-mer is represented as a point on the scatter plot, zero through 63, with that number serving as the y-coordinate. The x-axis is the order that the 3-mer appears in the genetic sequence. Cox designed the symbolic scatter plot so that those 3-mers that correspond to the same amino acid are adjacent to each another.
It’s a good use of “human in the middle” visualization where he’s attempting to replace completely automated systems with tools to make use of the superior human vision system to better discover patterns. His work will be presented at the 2009 International Conference on Bioinformatics and Computational Biology in Las Vegas.
via Symbolic Scatter Plot Helps Visualize Patterns Within DNA Sequence | Genome Technology | Sequencing | GenomeWeb.