This page contains supplementary information to the following paper:

Zasha Weinberg and Walter L. Ruzzo, "Sequence-based heuristics for faster annotation of non-coding RNA families", Bioinformatics, to appear.

 

Software

The software is here

Supplementary paper

This supplementary paper contains details on algorithms and implementation, as well as the full set of ROC-like curves.

Raw ROC-like Curves

The raw points for the ROC-like curves are supplied as comma-separated files.  The first column is the sensitivity, the second is the filtering fraction and the third is the heuristic threshold score (in log_2 units) to obtain this sensitivity & filtering fraction.

The complete set is available tar'd and gzip'd: here (3 MB).  The file names are described below.

Rfam families ROC-like curves

All files end in ".posvsfrac.csv".

The first part of the file name is the Rfam ID: RF00001, RF00005, RF00010, RF00029, RF00031, RF00059, RF00168 or RF00174.

The next part of the file name describes what filter was tested:

tRNAscan-SE scans

All files end in ".posvsfrac.csv".  All files use compact-type ML-heuristic.

For historical reasons, the prokaryote files begin with "TRNA2" while the eukaryote files begin with "tRNAscan-TRNA2".  Next the organism(s) is specified:

Next, the window length for the ML-heuristic is given:

Other data on tRNAscan-SE scans of these same databases are available from our previous paper; see this Web supplement.