image University of Washington Computer Science & Engineering
 Software from the Computational Molecular Biology Group
  CSE Home   About Us    Search    Contact Info 

   StatSigMA
   MSS
   Dapple
   COSMO
   FootPrinter
   MicroFootPrinter
   PhyME
   Projection
   YMF
   CMfinder
   Multiperm
   RaveNnA
   

Alignment Quality
 

StatSigMA: Statistical Significance of Multiple Alignments

StatSigMA computes the statistical significance of multiple sequence alignments (of either nucleotide or amino acid sequences), much as BLAST's E-values provide statistical significance for pairwise alignments.

Download

If you use this software for your publications, please read and cite:

High Scoring Regions
 

MSS: Finding all Maximal Scoring Subsequences.

MSS is a practical, linear time algorithm to find, in a sequence of numeric scores, those nonoverlapping, contiguous subsequences having greatest total scores.

Download (C++ source code; contributed by Shane Neph).

If you use this software for your publications, please read and cite:

  • Ruzzo and Tompa: A Linear Time Algorithm for Finding All Maximal Scoring Subsequences. Seventh International Conference on Intelligent Systems for Molecular Biology, Heidelberg, Germany, August 1999, pp. 234-241, PMID: 10786306.

Microarrays
 

Dapple: Image analysis software for DNA microarrays

Dapple is a program for quantitating spots on a two-color DNA microarray image. Given a pair of images from a comparative hybridization, Dapple finds the individual spots on the image, evaluates their qualities, and quantifies their total fluorescent intensities.

Dapple Web Site

If you use this software for your publications, please read and cite:

Motif Discovery
 

COSMO: Binding sites in coding regions

COSMO is a program that detects putative binding sites in coding regions.Given a set of orthologous mRNA sequences, it identifies regions whose conservation cannot be explained solely by the selective pressure on the protein encoded.

COSMO

If you use this software for your publications, please read and cite:

  • Blanchette, M. A comparative analysis method for detecting binding sites in coding regions. In Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB03), Berlin, 2003.

FootPrinter: A program for phylogenetic footprinting

Phylogenetic footprinting is a method that identifies putative regulatory elements in DNA sequences. It identifies regions of DNA that are unusually well conserved across a set of orthologous sequences.

Web Server (FootPrinter2.1)

Download (FootPrinter2.1)

Sample output

If you use this software for your publications, please read and cite:

  1. Blanchette, M. and Tompa, M. FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Research, vol. 31, no. 13, 2003, 3840-3842.
  2. Blanchette, M. and Tompa, M. Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting. Genome Research, vol. 12, no. 5, May 2002, 739-748 and
  3. Blanchette, M., Schwikowski, B., and Tompa, M. Algorithms for Phylogenetic Footprinting. Journal of Computational Biology, vol. 9, no. 2, 2002, 211-223.

MicroFootPrinter: A microbial front end for FootPrinter

MicroFootPrinter is a front end to the FootPrinter phylogenetic footprinting program, but with specific focus on prokaryotic genomes. You supply a prokaryotic species and gene of interest. MicroFootPrinter will then find related prokaryotes each containing a homologous gene, and run FootPrinter to identify motifs in the regulatory region of your chosen gene that are well conserved across these homologous genes.

Web Server.

If you use this software for your publications, please read and cite:

PhyME: Motif discovery in data sets that include both intraspecies overrepresentation and interspecies conservation

PhyME discovers motifs by integrating two important aspects of the motif's significance, overrepresentation and interspecies conservation, into one probabilistic score. The algorithm is based on multiple alignment and expectation-maximization.

Download

If you use this software for your publications, please read and cite:

Projection: A motif discovery program based on random projections

Download

If you use this software for your publications, please read and cite:

  1. Buhler, J. and Tompa, M. Finding Motifs Using Random Projections. Journal of Computational Biology, vol. 9, no. 2, 2002, 225-242.
  2. Buhler, J. Provably sensitive indexing strategies for biosequence similarity search. In Proceedings of the Sixth Annual International Conference on Computational Molecular Biology (RECOMB02) 90-99, Washington, D.C., 2002.

YMF and FindExplanators: An enumerative motif discovery program.

YMF identifies motifs (made of IUPAC symbols) that occur unusually often in a given set of sequences. FindExplanators extracts from that set of motifs a smaller set of independent motifs.

Web Server

Download

If you use this software for your publications, please read and cite:

  1. Sinha, S. and Tompa, M., YMF: a Program for Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation. Nucleic Acids Research, vol. 31, no. 13, July 2003, 3586-3588.
  2. Sinha, S. and Tompa, M. Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation. Nucleic Acids Research, vol. 30, no. 24, December 2002, 5549-5560.
  3. Sinha, S. and Tompa, M. A Statistical Method for Finding Transcription Factor Binding Sites, Eighth International Conference on Intelligent Systems for Molecular Biology, San Diego, CA, August 2000, 344-354.
  4. Blanchette, M. and Sinha, S. Separating real motifs from their artifacts. Bioinformatics, vol. 17, 2001, S30-S38.

RNA
 

CMfinder: A covariance model based RNA motif finding algorithm

CMfinder is a tool to predict RNA motifs in unaligned sequences. It is an expectation maximization algorithm using covariance models for motif description, featuring novel integration of multiple techniques for effective search of motif space, and a Bayesian framework that blends mutual information-based and folding energy-based approaches to predict structure in a principled way.

Web site and supplementary information.

Web Server.

Download (C and Perl source code).

If you use this software for your publications, please read and cite:

Multiperm: Shuffling multiple sequence alignments while approximately preserving dinucleotide frequencies.

Assessing the statistical significance of structured RNA predicted from multiple sequence alignments relies on the existence of a good null model. Multiperm is a random shuffling algorithm that preserves not only the gap and local conservation structure in alignments of arbitrarily many sequences, but also the mono- and approximate dinucleotide frequencies. The later characteristics have important effects on the predicted thermodynamic stability of RNA structures.

Download (C++ source code).

If you use this software for your publications, please read and cite:

  • Anandam, Torarinsson and Ruzzo. Multiperm: shuffling multiple sequence alignments while approximately preserving dinucleotide frequencies. Bioinformatics. 2009 Jan 9. [Epub ahead of print] PMID: 19136551.

RaveNnA: Faster Search for Non-coding RNA Families Without Loss of Accuracy

Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are slow. The RaveNnA software package makes CMs faster while provably sacrificing none of their accuracy (or faster still with little loss in sensitivity, depending on parameter settings).

Download (C++ source code).

If you use this software for your publications, please read and cite:

  1. Weinberg and Ruzzo: Faster Genome Annotation of Non-coding RNA Families Without Loss of Accuracy. Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004) , pp 243-251, March 2004, San Diego, CA.
  2. Weinberg and Ruzzo: Exploiting Conserved Structure for Faster Annotation of Non-coding RNAs Without Loss of Accuracy. Bioinformatics, 20 (suppl_1) i334-i341, 2004 PMID: 15262817.
  3. Weinberg and Ruzzo: Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics, 2006, 22(1):35-39. PMID: 16267089.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to bio-webmaster]