Noncoding RNA

Recent genomic analyses suggest that, although less than two percent of the human genome encodes proteins, the majority of DNA is transcribed into RNA. The functional significance of this widespread transcription remains controversial, but a steady stream of discoveries have identified surprising, complex, and subtle new roles for RNA in all realms of life. This ongoing research project is developing and applying new computational tools for discovery of noncoding RNA.

Computationally, a key issue is that RNA mainly exists as single-stranded molecules that fold back on themselves to form complex structures. Interactions between nucleotides that are well-separated in the RNA sequence but close in space greatly complicates the computational sequence analysis tasks needed to infer structures and search for instances of them [5].

We have developed a software tool called CMfinder for discovering RNA sequence/structure motifs present in a set of unaligned input sequences [18]. Additionally, our RaveNnA tool accelerates searches using such motifs as queries by orders of magnitude with little or no loss in accuracy [14-16]. We have applied these tools together with MicroFootPrinter [7] to discover new RNA motifs in bacteria [17]. These tools have led to the discovery and characterization of many of the known classes of bacterial riboswitches [12; 6, 8, 13], and other functionally important RNAs [3], and genome-scale clustering approaches show promise for discovering more [10]. We are also applying these tools to the human genome, and, although reliable estimation of false discovery rates in such screens remains a difficult problem [1], preliminary results point to the presence of thousands of novel noncoding RNA genes conserved across vertebrates [9].

We have also been involved in genomic annotation of noncoding RNAs in model organisms [4], characterization of RNA in vertebrate alignments [11], and discovery of novel RNAs via "next gen" RNA sequencing [2]. We have active ongoing projects on these and other aspects of RNA bioinformatics, and our tools are freely available.


