Supplementary Material

This web site supplements my article, "Faster genome annotation of non-coding RNA families without loss of accuracy" with the raw results of the scans, containing the new ncRNA  homologs discovered by the more sensitive technique.  If you use this data in your own research, please cite my article:

Z. Weinberg and W.L. Ruzzo (2004) "Faster genome annotation of non-coding RNA families without loss of accuracy", Proc. Eighth Annual Inter. Conf. on Computational Molecular Biology (RECOMB), p. 243-251.
Preprint: in Postscript, in Adobe Acrobat .pdf

The software that implemented the techniques in this paper is avilable for download here

 

The following table has a link to the current version of the Rfam Database for each ncRNA family that was scanned for this paper.  It then gives the raw results (.cmzasha), and a comma-separated file (.csv) that is more convenient to look at, and notates which family members were already in Rfam 5.0, and which were new.

Instructions:

In the following table, the first column is the Rfam accession number (see Rfam database Web site).  Next is a link to the given family in the current version of Rfam (which at some point in the future will be later than the Rfam 5.0 version that was used for my paper).  Next is a brief description of the family; the Rfam link has a paragraph on each family, and a couple of useful references.  The # known is the number of family members as reported in Rfam 5.0.  # new is the number of additional members found with the more sensitive technique (using the version of RFAMSEQ appropraite to Rfam 5.0).  Next is the .cmzasha file, which is the raw output of my program, and the .csv file, which is the result of processing the raw output slightly.

(Download all results files from the below table at once: tar & gzip archive)

Rfam Id link to Rfam name # known # new

.cmzasha
(raw scan)

.csv
(comma-separated file)

RF00008
(Rfam 4.0)

N/A Hammerhead ribozyme 313 278 file N/A
RF00008 link Hammerhead ribozyme 251 13 file file
RF00012 link U3 snRNA 129 0 file file
RF00015 link U4 snRNA 283 7 file file
RF00019 link Y RNA 1107 0 file file
RF00020 link U5 snRNA 199 1 file file
RF00024 link vertebrate telomerase 51 0 file file
RF00025 link ciliate telomerase 17 0 file file
RF00026 link U6 snRNA 1462 2 file file
RF00027 link let-7 microRNA 30 0 file file
RF00030 link RNase MRP 39 0 file file
RF00032 link Histone 3' element 1004 102 file file
RF00037 link Iron response element 201 121 file file
RF00043 link Plasmid copy control 8 0 file file
RF00050 link RFN element 107 0 file file
RF00052 link lin-4 microRNA 14 0 file file
RF00053 link mir-7 microRNA 10 0 file file
RF00054 link U25 snoRNA 28 0 file file
RF00055 link snoRNA Z37 28 0 file file
RF00066 link U7 snRNA 312 1 file file
RF00075 link mir-166 microRNA 14 0 file file
RF00093 link U18 snoRNA 43 0 file file
RF00095 link Pyrococcus snoRNA 57 123 file file
RF00103 link mir-1 microRNA 13 0 file file
RF00104 link mir-10 microRNA 34 0 file file
RF00151 link U58 snoRNA 12 0 file file
RF00162 link S box 128 3 file file
RF00163 link Hammerhead ribozyme 167 26 file file
RF00164 link Coronavirus s2m 115 0 file file
RF00165 link Coronavirus 3' UTR pseudoknot 60 0 file file
RF00167 link Purine element 69 54 file file
RF00169 link eubacterial SRP 162 0 file file
RF00170 link Retron msr RNA 11 48 file file
RF00173 link hairpin ribozyme 5 0 file file

About the .csv format

This file is a comma-separated file, which is intended to be viewed in Microsoft Excel, or a similar program.  Otherwise, it's a text file, so viewable in any text editor, although the files are very long, so it's going to be tough to read.  With a simple script (e.g. with Perl), you could convert it to other formats.

The columns in the .csv file are:

About the .cmzasha format

The .cmzasha file is the raw output of our software.  The information in it is essentially redundant with the .csv file.  Additional information is:


Zasha Weinberg (zasha@cs.washington.edu)
Last Update: January 23, 2004.