Noncoding RNA

Recent genomic analyses suggest that, although less than two percent of the human genome encodes proteins, the majority of DNA is transcribed into RNA. The functional significance of this widespread transcription remains controversial, but a steady stream of discoveries have identified surprising, complex, and subtle new roles for RNA in all realms of life. This ongoing research project is developing and applying new computational tools for discovery of noncoding RNA.

Computationally, a key issue is that RNA mainly exists as single-stranded molecules that fold back on themselves to form complex structures. Interactions between nucleotides that are well-separated in the RNA sequence but close in space greatly complicates the computational sequence analysis tasks needed to infer structures and search for instances of them [5].

We have developed a software tool called CMfinder for discovering RNA sequence/structure motifs present in a set of unaligned input sequences [18]. Additionally, our RaveNnA tool accelerates searches using such motifs as queries by orders of magnitude with little or no loss in accuracy [14-16]. We have applied these tools together with MicroFootPrinter [7] to discover new RNA motifs in bacteria [17]. These tools have led to the discovery and characterization of many of the known classes of bacterial riboswitches [12; 6, 8, 13], and other functionally important RNAs [3], and genome-scale clustering approaches show promise for discovering more [10]. We are also applying these tools to the human genome, and, although reliable estimation of false discovery rates in such screens remains a difficult problem [1], preliminary results point to the presence of thousands of novel noncoding RNA genes conserved across vertebrates [9].

We have also been involved in genomic annotation of noncoding RNAs in model organisms [4], characterization of RNA in vertebrate alignments [11], and discovery of novel RNAs via "next gen" RNA sequencing [2]. We have active ongoing projects on these and other aspects of RNA bioinformatics, and our tools are freely available.


  1. P. Anandam, E. Torarinsson, W. Ruzzo, "Multiperm: shuffling multiple sequence alignments while approximately preserving dinucleotide frequencies", Bioinformatics, vol. 25 (2009) 668-9. Pubmed 19136551.   Supplement.
  2. M. Bar, S. Wyman, B. Fritz, J. Qi, K. Garg, R. Parkin, E. Kroh, A. Bendoraite, P. Mitchell, A. Nelson, W. Ruzzo, C. Ware, J. Radich, R. Gentleman, H. Ruohola-Baker, M. Tewari, "MicroRNA discovery and profiling in human embryonic stem cells by deep sequencing of small RNA libraries", Stem Cells, vol. 26 (2008) 2496-505. Pubmed 18583537.
  3. J. Barrick, N. Sudarsan, Z. Weinberg, W. Ruzzo, R. Breaker, "6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter", RNA, vol. 11 (2005) 774-84. Pubmed 15811922.
  4. J. Eisen, R. Coyne, M. Wu, D. Wu, M. Thiagarajan, J. Wortman, J. Badger, Q. Ren, P. Amedeo, K. Jones, L. Tallon, A. Delcher, S. Salzberg, J. Silva, B. Haas, W. Majoros, M. Farzad, J. Carlton, R. Smith, J. Garg, R. Pearlman, K. Karrer, L. Sun, G. Manning, N. Elde, A. Turkewitz, D. Asai, D. Wilkes, Y. Wang, H. Cai, K. Collins, A. Stewart, S. Lee, K. Wilamowska, Z. Weinberg, W. Ruzzo, D. Wloga, J. Gaertig, J. Frankel, C. Tsao, M. Gorovsky, P. Keeling, R. Waller, N. Patron, M. Cherry, N. Stover, C. Krieger, C. Toro, H. Ryder, S. Williamson, R. Barbeau, E. Hamilton, E. Orias, "Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote", PLoS Biol., vol. 4 (2006) e286. Pubmed 16933976.   Supplement.
  5. J. Gorodkin, I. Hofacker, E. Torarinsson, Z. Yao, J. Havgaard, W. Ruzzo, "De novo prediction of structured RNAs from genomic sequences", Trends Biotechnol., vol. 28 (2010) 9-19. Pubmed 19942311.   Supplement.
  6. M. Mandal, M. Lee, J. Barrick, Z. Weinberg, G. Emilsson, W. Ruzzo, R. Breaker, "A glycine-dependent riboswitch that uses cooperative binding to control gene expression", Science, vol. 306 (2004) 275-9. Pubmed 15472076.
  7. S. Neph, M. Tompa, "MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes", Nucleic Acids Res., vol. 34 (2006) W366-8. Pubmed 16845027.   Supplement.
  8. E. Regulski, R. Moy, Z. Weinberg, J. Barrick, Z. Yao, W. Ruzzo, R. Breaker, "A widespread riboswitch candidate that controls bacterial genes involved in molybdenum cofactor and tungsten cofactor metabolism", Mol. Microbiol., vol. 68 (2008) 918-32. Pubmed 18363797.
  9. E. Torarinsson, Z. Yao, E. Wiklund, J. Bramsen, C. Hansen, J. Kjems, N. Tommerup, W. Ruzzo, J. Gorodkin, "Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions", Genome Res., vol. 18 (2008) 242-51. Pubmed 18096747.
  10. H. Tseng, Z. Weinberg, J. Gore, R. Breaker, W. Ruzzo, "Finding non-coding RNAs through genome-scale clustering", J Bioinform Comput Biol, vol. 7 (2009) 373-88. Pubmed 19340921.
  11. A. Wang, W. Ruzzo, M. Tompa, "How accurately is ncRNA aligned within whole-genome multiple alignments?", BMC Bioinformatics, vol. 8 (2007) 417. Pubmed 17963514.   Supplement.
  12. Z. Weinberg, J. Barrick, Z. Yao, A. Roth, J. Kim, J. Gore, J. Wang, E. Lee, K. Block, N. Sudarsan, S. Neph, M. Tompa, W. Ruzzo, R. Breaker, "Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline", Nucleic Acids Res., vol. 35 (2007) 4809-19. Pubmed 17621584.   Supplement.
  13. Z. Weinberg, E. Regulski, M. Hammond, J. Barrick, Z. Yao, W. Ruzzo, R. Breaker, "The aptamer core of SAM-IV riboswitches mimics the ligand-binding site of SAM-I riboswitches", RNA, vol. 14 (2008) 822-8. Pubmed 18369181.
  14. Z. Weinberg, W. Ruzzo, "Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy", Bioinformatics, vol. 20 Suppl 1 (2004) i334-41. Pubmed 15262817.   Supplement.
  15. Z. Weinberg, W. Ruzzo, "Faster Genome Annotation of Non-coding RNA Families Without Loss of Accuracy", Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004), (2004) pp 243-251.   Supplement.
  16. Z. Weinberg, W. Ruzzo, "Sequence-based heuristics for faster annotation of non-coding RNA families", Bioinformatics, vol. 22 (2006) 35-9. Pubmed 16267089.   Supplement.
  17. Z. Yao, J. Barrick, Z. Weinberg, S. Neph, R. Breaker, M. Tompa, W. Ruzzo, "A computational pipeline for high- throughput discovery of cis-regulatory noncoding RNA in prokaryotes", PLoS Comput. Biol., vol. 3 (2007) e126. Pubmed 17616982.   Supplement.
  18. Z. Yao, Z. Weinberg, W. Ruzzo, "CMfinder--a covariance model based RNA motif finding algorithm", Bioinformatics, vol. 22 (2006) 445-52. Pubmed 16357030.   Supplement.