This web site supplements my article, "Faster genome annotation of non-coding RNA families without loss of accuracy" with the raw results of the scans, containing the new ncRNA homologs discovered by the more sensitive technique. If you use this data in your own research, please cite my article:
Z. Weinberg and W.L. Ruzzo (2004) "Faster genome annotation of non-coding RNA families without loss of accuracy", Proc. Eighth Annual Inter. Conf. on Computational Molecular Biology (RECOMB), p. 243-251.
Preprint: in Postscript, in Adobe Acrobat .pdf
The software that implemented the techniques in this paper is avilable for download here
The following table has a link to the current version of the Rfam Database for each ncRNA family that was scanned for this paper. It then gives the raw results (.cmzasha), and a comma-separated file (.csv) that is more convenient to look at, and notates which family members were already in Rfam 5.0, and which were new.
In the following table, the first column is the Rfam accession number (see Rfam database Web site). Next is a link to the given family in the current version of Rfam (which at some point in the future will be later than the Rfam 5.0 version that was used for my paper). Next is a brief description of the family; the Rfam link has a paragraph on each family, and a couple of useful references. The # known is the number of family members as reported in Rfam 5.0. # new is the number of additional members found with the more sensitive technique (using the version of RFAMSEQ appropraite to Rfam 5.0). Next is the .cmzasha file, which is the raw output of my program, and the .csv file, which is the result of processing the raw output slightly.
(Download all results files from the below table at once: tar & gzip archive)
|Rfam Id||link to Rfam||name||# known||# new||
|RF00032||link||Histone 3' element||1004||102||file||file|
|RF00037||link||Iron response element||201||121||file||file|
|RF00043||link||Plasmid copy control||8||0||file||file|
|RF00165||link||Coronavirus 3' UTR pseudoknot||60||0||file||file|
|RF00170||link||Retron msr RNA||11||48||file||file|
This file is a comma-separated file, which is intended to be viewed in Microsoft Excel, or a similar program. Otherwise, it's a text file, so viewable in any text editor, although the files are very long, so it's going to be tough to read. With a simple script (e.g. with Perl), you could convert it to other formats.
The columns in the .csv file are:
The .cmzasha file is the raw output of our software. The information in it is essentially redundant with the .csv file. Additional information is: