A software infrastructure for the Comparative Analysis of RNA. The infrastructure consists of a novel database, RNA Comparative Analysis Database (rCAD) and an application for visualization/manipulation of data from rCAD, Comparative Analysis Toolkit User Interface (CATUI).
- New - RNA Comparative Analysis Tools 3.00, Released 8/19/2010
- CATUI 3.51
- Note: - If you installed CATUI 3.5 via clickonce, you will have to uninstall it and reinstall via clickonce to get CATUI 3.51.
- rCAD Reference Implementation 1.5.1
- Note: - If you installed rCAD 1.5.0, you will have to uninstall it and reinstall to update to rCAD 1.5.1
- Note: - The 5S rRNA rCAD dataset at the CRW Site (http://www.rna.ccbb.utexas.edu/DAT/3C/Alignment/#CRW) is updated to included 179 2-D structure models (requires rCAD 1.5.1).
- Note: - We have found issues related to installing the x86 version of the rCAD Utilities on x64 Windows. In particular, the rCAD Creator and rCAD Taxonomy Updater utilities will complain of Registry errors. Please install the x64 version of the rCAD Utilties on x64 Windows and the x86 on version of the rCAD Utilities on x86 Windows.
Publications/Citing this work
1. Doshi K.J., Gutell R.R., and Ozer S. (2010)
rCAD: The RNA Comparative Analysis Database
(manuscript submitted for publication
: Some of the figures on this site are adapted from that manuscript.
2. Doshi K.J., Gardner D.P., Cannone.J.J, Gutell.R.R. and Smith M. (2010)
CATUI: The Comparative Analysis Toolkit User Interface
(manuscript in preparation
- Gutell Lab Research Site:
- Microsoft Research Spotlight:
- Computational Tools to Help Solve the Puzzle of RNA Structure
- rCAD - RNA Comparative Analysis using SQL Server (video presentation):
- Microsoft Biology Foundation:
The Gutell Lab at The University of Texas at Austin (http://www.rna.ccbb.utexas.edu
) has been successfully applying Comparative Analysis techniques to study Ribosomal RNA's amongst other RNA's (1)
. Comparative Analysis has been used to successfully decipher the structure of RNA molecules (2)
, and identify and characterize new RNA structural motifs. Successful application of Comparative Analysis for RNA involves the analysis of multiple, related dimensions of information: sequence and a sequence alignment, structure (secondary and 3-D structure), and evolutionary relationships (phylogeny).
Comparative Analysis of an RNA is an iterative process involving two distinct phases: bootstrap
. In the bootstrap
phase, an initial set of homologous sequences (closely related evolutionarily) for the RNA of interest are aligned for maximum sequence identity and a secondary structure hypothesis is made. Selection of the initial sequence set is important; generally you need sequences with enough identity to enable a sequence based alignment, but also enough variation to observe higher-order relationships between positions (e.g., covariations). In the curation
phase, more sequences are added to the data set in an iterative process. The structure model hypothesis is continually tested and accepted or refuted and revised as more evolutionarily distant sequences are added to the sequence alignment. Both the bootstrap
and the curation
phases require the Biologist to be involved in the analysis; however, certain steps can be streamlined and automated.
The successes of the Gutell Lab and other research groups using Comparative Analysis techniques have contributed significantly to RNA science. The tremendous increase in available biological information has created new opportunities to further decipher the structure, function and evolution of cellular components, such as Ribosomal RNA, but presents new computational challenges to obtain adequate performance and scalability. The lack of software tools capable of allowing Biologists to interactively
apply Comparative Analysis to ever increasing amounts of biological information has become a significant bottleneck. The Gutell Lab has continually developed and re-develop their own custom software tools to handle the ever increasing amounts of RNA sequence and structure information. While the tools developed worked within the Gutell Lab, they were not designed for dissemination within the wider scientific community. The Gutell Lab in collaboration with Microsoft Research have started this project to develop an advanced software infrastructure for applying Comparative Analysis to large RNA datasets. These tools are to be used within the Gutell Lab, but also be made available, open source, to the scientific community.
Our Comparative Analysis software infrastructure can be broken down into two separate elements. The first element is an advanced database system, built on Microsoft SQL Server 2008, capable of organizing large amounts of biological information for efficient retrieval and multi-dimensional analysis. This database system is known as the RNA Comparative Analysis Database (rCAD) (3)
. The rCAD database cross-indexes several dimensions of an RNA dataset: sequence, structure, and phylogenetic/evolutionary information. The rCAD database is the first biological database system capable of storing sequence alignments down to the resolution of individual nucleotides (3)
. Individual rows, columns and cells in an alignment can be directly included in SQL statements along with different RNA structure and phylogenetic/evolutionary information (3)
. By uniting multiple dimensions of information, rCAD allows new, innovative analyses of the fundamentals of RNA such as: statistics of different RNA sequence/structure motifs (Gardner, DP et al, manuscript in preparation
or the evolution of RNA structural elements (4)
The second element of this project is a comprehensive, interactive visualization package for analyzing and manipulating large RNA datasets using Comparative Analysis. This software package is the Comparative Analysis Toolkit User Interface (CATUI)
(manuscript in preparation
). We intend for the CATUI to be the most sophisticated RNA Comparative Analysis sequence and structure visualization tool available. The CATUI will interact directly with the rCAD database platform.
The CATUI will support the manipulation and editing of sequence alignments containing more than 100,000 sequences. The sequence alignments can be in files or in the rCAD database. The CATUI will provide an integrated RNA secondary structure editing widget, and advanced alignment navigation tools. Users will be able to navigate sequence alignments by taxonomy or common secondary structure. Below is a very screen capture of the visualization of integrated dimensions in the CATUI (CATUI 3.0, not available till Feb 2010). In the screen capture, a sequence alignment is opened, and the sequences grouped by evolutionary relationships. A taxonomy browser allows the user to navigate the alignment in the vertical direction and an "birds-eye" viewer allows one to move across the alignment two-dimensionally.
- Cannone JJ, Subramanian S, Schnare MN, et al. (2002) The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron and other RNAs. BMC Bioinformatics 3:2.
- Gutell RR, Lee JC, Cannone JJ (2002) The accuracy of ribosomal RNA comparative structure models. Curr Opin Struct Bio 12:301-310
- Doshi KJ, Gutell RR, Ozer S (2010) rCAD: The RNA Comparative Analysis Database (manuscript submitted)
- Xu W, Ozer S, Gutell R (2009) Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA. Scientific and Statistical Database Management. Berlin/Heidlberg: Springer. 200-216.
Additional Information and Credits
- This project is partially funded by grants from the National Institutes of Health, GM085337 and GM067317, and Microsoft External Research.
- Mark Smith of Julmar Technologies Inc. http://www.julmar.com