CATUI Distributed Alignment Editor Requirements

Basic Requirements

  • Render genomes, single gene alignments and multi-gene alignments
    • How to handle circular genomes, the alignments are no longer "linear"
    • Are there ways force the alignment of circular genomes into a grid where the editor is smart enough to know the sequence fragments are non-contiguous
    • ''First tackle'': Mitochondrial and Chloroplast genomes
    • ''Next tackle'': Bacterial genomes
    • ''Then tackle'': Eukaryotic genomes?
  • What are the size limits for a sequence alignment (from an Application perspective)?
    • Theoretically alignments will have an unlimited number of columns when they are stored in the rCAD database because the Alignment editor is like a "Viewport"

Alignment Rendering

  • Birds eye view ("see" the entire alignment density)
    • Abstracting a sequence alignment as lines, which can be zoomed in on to see actual sequence data.
    • This is similar to map viewing applications such as Google Earth or Microsoft Virtual Earth.
  • Reference sequence (user modifiable) for numbering, structure diagram overlay
  • Render contiguous and non-contiguous column sets
  • Adornments at row, cell, and column level indicate information/features (e.g., a 2D structure diagram is available for this particular row)
  • Open different/multiple alignments quickly
  • Bottom pane indicators
    • Consensus
    • Density
  • Zoom within the alignment grid
  • Mask out empty columns in a sub view of an alignment
    • This is a similar idea to showing the columns in two halves of a helix while masking out the intervening columns

Alignment Editing

  • Group mode
    • Defined on whole rows or subsets of rows
      • Specify using range-based, ordinal(s) that are view-based - this will allow you to layer it on top of contexts
    • Defined on entire alignment (all columns) or a subset of columns
    • Defined from a context or an analysis operation like Identity or Evaluate
    • Named groups
      • In some cases, will have to be persisted with an alignment, but not always
    • The same sequence can be a member of multiple groups.
    • Group definitions can persist across views/be referenced within different views
      • Because sequences know WHICH groups they are a member of.
    • Support undo operations within the group
    • Toggle groups on/off as opposed to un-defining them to turn them off
    • What about default groups?
  • Mouse-based or Keyboard-based editing
    • Slide operations (insert gaps as needed)
    • Move/drag operations (only move across existing gaps)
    • Insert operations
    • Lock sequences (nucleotides) from change by default
  • Displaying Edits & Undo
    • Undo support
      • Could even do multiple row undo (up to a topology change such as a column insert)
      • Revert the entire row back to the unedited state (what to do about other rows that are edited and rely on a topology change)
        • It may be that we don't have to undo the edits on other columns as long as they havn't populated an inserted column that is being removed.
        • Maybe remove an inserted column as well, just slide everything X registers to the right
    • Use differential background coloring to indicate which rows and columns have edits (in grid view and the birds eye view).
    • Icon adornment and shading to indicate row(s)/cells(s) are edited

Alignment navigation

  • Jump in column and row increments or jump to specific column or row ("Fast Scrolling")
  • Use birds eye view to make large moves as point and click, could also do this using some kind of alignment "line" representation
  • Zooming in and out to the point where sequences just look like abstract lines.

Alignment Searching

  • Search the sequence alignment for patterns of nucleotides (user-specified)
    • Example: AGNNC(0-10)CCGA-N
    • Define libraries of these search strings
    • Support the complete set of IUPAC characters?
    • Allow mismatches in specified nucleotides.
    • Indicate variable stretches
    • Optionally replace gaps with N's in search string. Default will be to ignore gaps in search string.
      • Consequence is that N could mean nothing.

Last edited Dec 5, 2008 at 8:57 PM by kjdoshi, version 5


No comments yet.