rCAD Alignment Loader Utility

Purpose

Load an RNA sequence alignment into an rCAD database.

Currently, the utility only supports specially formatted RNA sequence alignments from the CRW Web Site http://www.rna.ccbb.utexas.edu/DAT/3C/Alignment which use the CRW format. The CRW format is documented at http://www.rna.ccbb.utexas.edu/AI/General/tech. CRW formatted RNA datasets include sequence alignments with Evolutionary Relationship mapping and structure information. These datasets allow the user to play with the full power of rCAD due to the integration of Sequence, Taxonomy and Structure. In the future, this utility will be expanded to support different alignment formats.

Note: If you are using SQL Server 2008 Express, be aware of the limitations: 4GB of user data per database, 1GB memory and 1 CPU socket. Some sequence alignments are too large to be loaded into an Express database. For other alignments, performance may be unacceptable due to the limitations of Express. With each RNA sequence alignment provided at the CRW Web Site (link), the approximate size of each alignment once loaded into rCAD and the suitability for Express are indicated.

Table of Contents

1. Launch the Utility (Back to Top)

  • Shortcut in Start Menu: All Programs\Gutell Lab\rCAD\Utilities: rCAD Alignment Loader
rCAD Alignment Loader Main Window

2. Identify the target SQL Server Instance (Back to Top)

  • Enter the name of the SQL Server Instance with the rCAD database where the sequence alignment is to be loaded.
  • Can leave the Instance field blank if the rCAD database is installed on a Default SQL Server instance.
  • For SQL Express, enter SQLEXPRESS for the Default instance name.
rCAD Alignment Loader Main Window

3. Identify the SQL Server Installation Type (Back to Top)

  • If the target rCAD database is installed on SQL Express or SQL Server without Integration Services, check the box to signify this condition (A), otherwise, skip to (Step 4).
  • A script and a set of datafiles will be generated for loading the alignment using SQLCMD. Identify a directory to output the script file and datafiles (B).
rCAD Alignment Loader Main Window

4. Configure the SQL Server Login Parameters (Back to Top)

  • The database login must have bulk insert privileges on the target rCAD database.
  • Authentication can be either via the Windows Authentication (Trusted Connection) or SQL Server Security
  • Windows Authentication is based on the current Windows login. That login must have bulk insert privileges on the target rCAD database.
  • If using Windows Authentication, continue to (Step 5).
  • If using SQL Server Security, enter a login id and password with bulk insert privileges on the target rCAD database.
rCAD Alignment Loader Main Window

5. Test the SQL Server Connection and Select the rCAD Database (Back to Top)

  • Click the Test button to check the configured database connection (A).
  • On a successful connection test, the user is prompted to identify the desired rCAD database. (B)
  • After the rCAD database is identified, click Next.
rCAD Alignment Loader Main Window

6. Select the RNA Sequence Alignment to Load into rCAD (Back to Top)

  • Currently, only RNA sequence alignments formatted for import into rCAD are supported. The CRW format is a compressed representation of an RNA sequence alignment including secondary structure models.
  • Different test alignments are available at link
  • Select the CRW Alignment Type from the drop down and then identify the alignment file to load from the file system.
rCAD Alignment Loader Step 2
  • Load the selected alignment into memory by pressing the Load Alignment button.
  • When the alignment is successfully loaded into memory, metadata is provided and the Next button is enabled.
  • Note: Large Alignments may require a significant amount of memory (upto or exceeding 4GB).
rCAD Alignment Loader Step 2

7. Map the Selected RNA Sequence Alignment to the Selected rCAD Database (Back to Top)

  • Review the metadata for each sequence in the alignment. For each row in the alignment, the Scientific Name, Cell Location, number of base pairs (if a secondary structure model was included) and NCBI Genbank identifier are provided.
rCAD Alignment Loader Step 3
  • Map the sequence alignment into the selected rCAD database by pressing the Map To RCAD button. The primary purpose of this mapping step is to identify keys (SeqID, and TaxID) for each sequence in the alignment.
  • When the mapping is complete, you can review the proposed keys for each sequence in the alignment.
  • Note: Currently, if the same sequence appears in two different sequence alignments, it will be loaded twice. In the future, a merge step will be added to identify and remove any duplicate sequences. Part of the challenge in duplicate identification is distinguishing between different versions in Genbank Identifiers.
rCAD Alignment Loader Step 3

8. Load the RNA Sequence Alignment to the Selected rCAD Database (Back to Top)

  • If the sequence alignment was successfully mapped to the target rCAD database in Step 7, the Load to RCAD button should be displayed.
  • Click the Load to RCAD button to load the sequence alignment.
rCAD Alignment Loader Step 3
  • If using SQL Server with SSIS, the program will load the sequence alignment directly into the rCAD database.
  • If using SQL Express 2008 or SQL Server without SSIS, the install script and data files will be output to the directory specified in Step 3. The generated script file will be named: rCADAlignmentLoader.sql.
    • Use the SQLCMD Utility (http://msdn.microsoft.com/en-us/library/ms162773.aspx), to execute the alignment load script.
    • The command line is > [Program Files]\Microsoft SQL Server\100\Tools\Binn\sqlcmd -S .\[instance name] -i [Specified Output Path]\rCADAlignmentLoader.sql
    • Note: If using the default instance for SQL Server Standard or higher, {[instance name]} in the SQLCMD command line is blank. If using the default instance for SQL Express, {[instance name]} in the SQLCMD command line is SQLEXPRESS.

9. Visualize the RNA Sequence Alignment with the CATUI (Back to Top)

  • If the CATUI is installed, it can be used to visualize RNA sequence alignments in an rCAD database. The topic is covered in more detail in the CATUI documentation, below we outline the steps.
  • Open the CATUI, the shortcut in Start Menu is: All Programs\Gutell Lab\CATUI\: CATUI 2.0 (BETA)
  • Right-click on Sequence Alignments and select *RCAD Database (RI)
CATUI Aln Viz
  • Create an rCAD database connection. The following fields are to be filled out:
    • Connection name: in this example, we use Test.
    • Server: The local computer.
    • Database Name: in this example, our database is named rCAD, select your appropriate rCAD database.
    • Logon: Windows Authentication (Trusted Connection).
CATUI Aln Viz
  • Select the sequence alignment to visualize. In this example, we are using a 5S Ribosomal RNA sequence alignment that is loaded in our example rCAD database. This alignment has the name 5S.A.aln. We want sequences from all locations (Nucleus, Mitochondrion, etc...). You should set the appropriate Sequence Type and select your desired alignment.
CATUI Aln Viz
  • Select the phylogenetic subset of the selected sequence alignment to visualize. In our example, we will visualize all Archaea sequences.
CATUI Aln Viz
  • Click Next to dismiss the dialog, the CATUI will process the parameters and prepare to visualize the sequence alignment. When the alignment is listed with its icon, its ready for visualization.
CATUI Aln Viz
  • To visualize the sequence alignment, right-click, select Show View and then select Default Alignment Viewer
CATUI Aln Viz
  • The alignment visualization will open with the sequences sorted in alphabetical order. Refer to the CATUI documentation for different sorting options.
CATUI Aln Viz

Last edited Jan 28, 2010 at 6:57 PM by kjdoshi, version 24

Comments

No comments yet.