Recent genomic analyses suggest that, although less than two percent of the human genome encodes proteins, the majority of DNA is transcribed into RNA. The functional significance of this widespread transcription remains controversial, but a steady stream of discoveries have identified surprising, complex, and subtle new roles for RNA in all realms of life. This ongoing research project is developing and applying new computational tools for discovery of noncoding RNA.
Computationally, a key issue is that RNA mainly exists as single-stranded molecules that fold back on themselves to form complex structures. Interactions between nucleotides that are well-separated in the RNA sequence but close in space greatly complicates the computational sequence analysis tasks needed to infer structures and search for instances of them .
We have developed a software tool called CMfinder for discovering RNA sequence/structure motifs present in a set of unaligned input sequences . Additionally, our RaveNnA tool accelerates searches using such motifs as queries by orders of magnitude with little or no loss in accuracy [14-16]. We have applied these tools together with MicroFootPrinter  to discover new RNA motifs in bacteria . These tools have led to the discovery and characterization of many of the known classes of bacterial riboswitches [12; 6, 8, 13], and other functionally important RNAs , and genome-scale clustering approaches show promise for discovering more . We are also applying these tools to the human genome, and, although reliable estimation of false discovery rates in such screens remains a difficult problem , preliminary results point to the presence of thousands of novel noncoding RNA genes conserved across vertebrates .
We have also been involved in genomic annotation of noncoding RNAs in model organisms , characterization of RNA in vertebrate alignments , and discovery of novel RNAs via "next gen" RNA sequencing . We have active ongoing projects on these and other aspects of RNA bioinformatics, and our tools are freely available.
- "Multiperm: shuffling multiple sequence alignments while approximately preserving dinucleotide frequencies", Bioinformatics, vol. 25 (2009) 668-9. , , ,
- "MicroRNA discovery and profiling in human embryonic stem cells by deep sequencing of small RNA libraries", Stem Cells, vol. 26 (2008) 2496-505. , , , , , , , , , , , , , , , ,
- "6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter", RNA, vol. 11 (2005) 774-84. , , , , ,
- "Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote", PLoS Biol., vol. 4 (2006) e286. , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
- "De novo prediction of structured RNAs from genomic sequences", Trends Biotechnol., vol. 28 (2010) 9-19. , , , , , ,
- "A glycine-dependent riboswitch that uses cooperative binding to control gene expression", Science, vol. 306 (2004) 275-9. , , , , , , ,
- "MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes", Nucleic Acids Res., vol. 34 (2006) W366-8. , ,
- "A widespread riboswitch candidate that controls bacterial genes involved in molybdenum cofactor and tungsten cofactor metabolism", Mol. Microbiol., vol. 68 (2008) 918-32. , , , , , , ,
- "Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions", Genome Res., vol. 18 (2008) 242-51. , , , , , , , , ,
- "Finding non-coding RNAs through genome-scale clustering", J Bioinform Comput Biol, vol. 7 (2009) 373-88. , , , , ,
- "How accurately is ncRNA aligned within whole-genome multiple alignments?", BMC Bioinformatics, vol. 8 (2007) 417. , , ,
- "Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline", Nucleic Acids Res., vol. 35 (2007) 4809-19. , , , , , , , , , , , , , ,
- "The aptamer core of SAM-IV riboswitches mimics the ligand-binding site of SAM-I riboswitches", RNA, vol. 14 (2008) 822-8. , , , , , , ,
- "Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy", Bioinformatics, vol. 20 Suppl 1 (2004) i334-41. , ,
- "Faster Genome Annotation of Non-coding RNA Families Without Loss of Accuracy", Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004), (2004) pp 243-251. , ,
- "Sequence-based heuristics for faster annotation of non-coding RNA families", Bioinformatics, vol. 22 (2006) 35-9. , ,
- "A computational pipeline for high- throughput discovery of cis-regulatory noncoding RNA in prokaryotes", PLoS Comput. Biol., vol. 3 (2007) e126. , , , , , , ,
- "CMfinder--a covariance model based RNA motif finding algorithm", Bioinformatics, vol. 22 (2006) 445-52. , , ,