Supplementary Material
This web site supplements my article, "Exploiting Conserved Structure for Faster
Annotation of Non-coding RNAs Without Loss of Accuracy" with the raw
results of the scans, containing the new ncRNA homologs discovered by the
more sensitive technique. If you use this data in your own research,
please cite my article:
Z. Weinberg and W.L. Ruzzo (2004) "Exploiting Conserved Structure for Faster
Annotation of Non-coding RNAs Without Loss of Accuracy", Bioinformatics,
20 (suppl. 1): i334-i340. Presented at the
12th International Conference on Intelligent Systems for Molecular Biology
(ISMB 2004) .
Download preprint: in Adobe Acrobat
(pdf), in Postcript
Technical supplement
This supplementary paper describes additional technical information on
the implementation:
Download supplement: in Adobe Acrobat
(pdf), in Postscript
(Last updated October 6, 2004.)
Software download
The software that implemented the techniques in this paper is avilable for download here
Raw results of scans
The following table has a link to the current version of the
Rfam Database for each ncRNA family that was scanned for this
paper. It then gives the raw results (.cmzasha), and a comma-separated
file (.csv) that is more convenient to look at, and notates which family
members were already in Rfam 5.0, and which were new. These results are
for Rfam 5.0, which is no longer the current version.
Instructions:
In the following table, the first column is the Rfam accession number (see
Rfam database Web site). Next is a link to the given family in
the current version of Rfam (which at some point in the future will be later
than the Rfam 5.0 version that was used for my paper). Next is a brief
description of the family; the Rfam link has a paragraph on each family, and a
couple of useful references. The # known is the number of family members
as reported in Rfam 5.0. # new is the number of additional members found
with the more sensitive technique (using the version of RFAMSEQ appropriate to
Rfam 5.0). Next is the .cmzasha file, which is the raw output of my
program, and the .csv file, which is the result of processing the raw output
slightly.
(Download all results files from the below table at once: tar
& gzip archive)
Filter series used in scans
The following describes, for each family scanned, the series of filters used in
that scan. The selection of a filter series is now a fully automated
process, as described in the technical supplement. However, the scheme
described in the ISMB paper is partially manual, and for some of the easier
families, even more manual tasks were performed; for some families like
RF00004, a better filter series could almost certainly be found.
For each family, a numbered list is given showing each filter in the order it is
applied. Each filter either begins with 'hmm', 'sub' (Sub-CM) or
'store-pair'.
If 'hmm' a profile HMM is used, and it's type (expanded or compact) is
given. (Note that the sub-CM and store-pair modifications are all applied
on top of the expanded-type HMM).
After 'sub' the node at which the sub-CM is rooted is given, followed by the
sub-CM-specific window length is given. For example, for RF00001,
"sub,40,60" is used, which is a Sub-CM rooted at node 40 using a window length
of only 60 (even though the full ncRNA requires a window length of 180).
If multiple sub-CMs are used simultaneously in some filter, they are separated
by front slashes ('/').
After 'store-pair', the list of modifications is given. Each modification
consists of a number followed by a string. The number is the node that is
to be modified. The string specifies what information should be stored
for the pair at that node. The first letter says whether the left or
right nucleotide is stored: 'l'=left, 'r'=right. The remainder specifies
a partition of the 5 symbols ACGU_ indicating what is stored. The
underscore ('_') represents the empty character (written as epsilon in the
paper). To specify the partition, the symbols are separated
with a dash ('-'). For example, "store-pair,3,l-A_-CG-U,76,r-AG_-C-U"
says that (1) node 3 is modified by remembering which of the following
sets the left nucleotide fits into: {A,_} or {C,G} or {U}, and (2) node 76 is
modified by remembering which of the following sets the right nucleotide fits
into: {A,G,_} or {C} or {U}.
-
RF00001
-
hmm (expanded-type)
-
store-pair,47,l-A-C-G-U-_,87,l-A-C-G-U-_
-
store-pair,43,l-A-C-G-U-_,44,l-A-C-G-U-_,86,l-A-C-G-U-_,87,l-A-C-G-U-_
-
sub,40,60
-
RF00004
-
hmm (expanded-type)
-
store-pair,60,l-A-C-G-U-_,78,l-A-C-G-U-_,26,l-A-C-G-U-_,130,l-A-C-G-U-_,152,l-A-C-G-U-_
-
store-pair,59,l-A-C-G-U-_,60,l-A-C-G-U-_,77,l-A-C-G-U-_,78,l-A-C-G-U-_,25,l-A-C-G-U-_,26,l-A-C-G-U-_,129,l-A-C-G-U-_,130,l-A-C-G-U-_,151,l-A-C-G-U-_,152,l-A-C-G-U-_
-
RF00005
-
store-pair,17,l-A_-CG-U,16,l-AG_-C-U,34,r-A_-CG-U,33,r-AG_-CU,32,r-AG_-CU,53,r-AGU-C-_,52,l-CGU-A_,51,r-AG_-CU
-
store-pair,17,l-A_-CG-U,16,l-AG_-C-U,15,l-CU_-AG,34,r-C_-A-G-U,33,r-AG_-CU,32,r-AG_-CU,31,l-AG_-CU,53,r-AGU-C-_,52,l-CU_-A-G,51,r-AG_-CU,50,l-AG_-CU
-
sub,30,40/sub,49,40
-
sub,30,40/sub,49,40/sub,15,40
-
RF00009
-
store-pair,125,l-A-C-G-U-_,126,l-A-C-G-U-_,139,l-A-C-G-U-_,140,l-A-C-G-U-_,153,l-A-C-G-U-_,154,l-A-C-G-U-_,80,l-A-C-G-U-_,81,l-A-C-G-U-_,195,l-A-C-G-U-_,196,l-A-C-G-U-_
-
RF00010
-
hmm (expanded-type)
-
store-pair,57,r-AG-U_-C,56,l-ACU_-G,72,r-AG-U_-C,71,r-AG_-CU,117,r-AG-CU-_,144,r-AGU-C-_,163,r-AGU-C-_,162,l-AG_-CU,203,r-ACGU-_,202,l-ACGU-_,229,r-A_-GU-C,228,l-AG_-CU,262,l-AU-C-G-_,261,r-CGU-A_
-
store-pair,57,r-AG-U_-C,56,l-ACU_-G,55,r-AGU-C_,72,r-AG-U_-C,71,r-AG_-CU,70,l-CU_-AG,117,r-AG-CU-_,116,r-AG-CU-_,144,r-AGU-C-_,143,r-AU-C-G-_,163,r-AGU-C-_,162,l-A-C-G-U-_,203,r-ACGU-_,202,l-ACGU-_,197,r-ACGU-_,229,r-A_-GU-C,228,l-AG-C-U-_,262,r-AC-GU-_,261,r-CGU-A_,260,r-AG-CU-_
-
store-pair,57,r-AG-U_-C,56,l-ACU_-G,55,r-AGU-C_,54,l-AGU-C_,72,r-AU-C-G-_,71,r-AG_-CU,70,l-CU_-AG,69,r-ACU-G_,117,r-AG-CU-_,116,r-ACU-G_,115,l-ACU-G_,144,r-AGU-C-_,143,r-AU-C-G-_,140,r-AGU_-C,163,r-AGU-C-_,162,l-A-C-G-U-_,161,r-CU_-AG,203,r-ACGU-_,202,l-ACGU-_,197,r-ACGU-_,196,l-ACGU-_,229,r-AG-C-U-_,228,l-AG-C-U-_,224,l-AG_-CU,262,r-AC-GU-_,261,l-AG-CU-_,259,r-AG-CU-_
-
store-pair,57,r-AG-U_-C,56,l-ACU_-G,55,r-AGU-C_,54,l-AGU-C_,53,l-AGU_-C,72,r-AU-C-G-_,71,r-AG_-CU,70,l-CU_-AG,69,r-ACU-G_,67,l-AC_-GU,117,r-AG-CU-_,116,r-AG-CU-_,115,l-ACU-G_,113,r-ACU_-G,144,r-AGU-C-_,143,r-AU-C-G-_,140,r-AGU_-C,139,l-ACU_-G,163,r-AGU-C-_,162,l-AG-C-U-_,161,r-CU_-AG,157,l-CU_-AG,203,r-ACGU-_,202,l-ACGU-_,197,r-ACGU-_,196,l-ACGU-_,178,l-ACGU-_,229,r-AG-C-U-_,228,l-AG-C-U-_,225,l-ACU-G_,224,l-AG_-CU,262,r-AC-GU-_,261,l-AG-CU-_,260,r-AG-CU-_,259,r-AG-CU-_
-
store-pair,57,r-AG-U_-C,56,l-ACU_-G,55,r-A_-GU-C,54,l-AGU-C_,53,l-AGU_-C,72,r-AU-C-G-_,71,l-A_-CU-G,70,l-CU_-AG,69,r-ACU-G_,67,l-AC_-GU,117,r-AG-CU-_,116,r-AG-CU-_,115,l-AG-CU-_,113,r-ACU_-G,144,r-AGU-C-_,143,l-AG-CU-_,142,l-ACG-U_,140,r-AGU_-C,139,l-ACU_-G,163,r-AGU-C-_,162,l-A-C-G-U-_,161,r-CU_-AG,157,l-CU_-AG,156,r-ACU_-G,203,r-ACGU-_,202,l-ACGU-_,197,r-ACGU-_,196,l-ACGU-_,177,l-AGU-C-_,229,r-AG-C-U-_,228,l-AG-C-U-_,225,r-AG-CU-_,224,l-AG_-CU,262,r-AC-GU-_,261,l-AG-CU-_,260,r-AG-CU-_,259,r-AG-CU-_,255,l-ACGU-_
-
store-pair,57,r-AG-C-U-_,56,l-ACU_-G,55,r-A_-GU-C,54,r-CU_-A-G,53,l-AGU_-C,72,r-AU-C-G-_,71,l-A_-CU-G,70,l-CU_-AG,69,r-ACU-G_,67,r-A_-CU-G,117,r-AG-CU-_,116,r-AG-CU-_,115,l-ACU-G_,114,r-ACG-U_,113,r-ACU_-G,144,r-AGU-C-_,143,r-AU-C-G-_,142,l-ACG-U_,140,r-AGU_-C,139,l-ACU_-G,163,r-AGU-C-_,162,l-A-C-G-U-_,161,r-CU_-AG,157,l-CU_-AG,156,r-ACU_-G,203,r-ACGU-_,202,l-ACGU-_,197,r-ACGU-_,196,l-ACGU-_,178,l-ACGU-_,176,l-CGU-A_,229,r-AG-C-U-_,228,l-AG-C-U-_,225,r-AG-CU-_,224,l-A_-GU-C,262,r-AC-GU-_,261,l-AG-CU-_,260,r-AG-CU-_,259,r-AG-CU-_,255,l-ACGU-_
-
store-pair,57,r-AG-C-U-_,56,l-ACU_-G,55,r-A_-GU-C,54,r-CU_-A-G,53,l-AGU_-C,72,r-AU-C-G-_,71,l-A_-CU-G,70,l-A_-CU-G,69,r-ACU-G_,67,r-A_-CU-G,117,r-AG-CU-_,116,r-AG-CU-_,115,l-AG-CU-_,113,r-ACU_-G,112,r-ACU_-G,144,r-AGU-C-_,143,r-AU-C-G-_,142,l-ACG-U_,141,r-ACU-G_,140,r-AGU_-C,139,l-ACU_-G,163,r-AGU-C-_,162,l-AG-C-U-_,161,r-CU_-AG,157,l-CU_-AG,156,r-ACU_-G,154,l-ACU_-G,203,r-ACGU-_,202,l-ACGU-_,197,r-ACGU-_,196,l-ACGU-_,177,l-AGU-C-_,176,l-CGU-A_,229,r-A_-GU-C,228,l-AG-C-U-_,227,r-AG-CU-_,225,r-AG-CU-_,224,l-AG_-CU,262,r-AC-GU-_,261,l-AG-CU-_,260,r-AG-CU-_,259,r-AG-CU-_,258,r-AGU-C_,255,l-ACGU-_
-
sub,222,61/sub,53,202/sub,67,162/sub,202,102/sub,175,430
-
store-pair,57,r-AG-C-U-_,56,l-ACU-G-_,55,r-A-C-G-U-_,54,r-CU_-A-G,53,l-AU_-C-G,72,r-AU-C-G-_,71,r-AU-C-G-_,70,l-A_-CU-G,69,r-A_-CU-G,67,r-U_-A-C-G,117,r-AG-CU-_,116,r-AG-CU-_,115,l-AG-CU-_,114,r-ACG-U_,113,r-ACU_-G,112,r-ACU_-G,108,r-ACU_-G,144,r-AGU-C-_,143,r-AU-C-G-_,142,l-ACU-G-_,141,r-ACU-G_,140,r-AGU-C-_,139,l-ACU_-G,163,r-AGU-C-_,162,l-A-C-G-U-_,161,r-CU_-AG,159,l-AGU-C_,157,l-CU-A-G-_,156,r-ACU_-G,154,l-ACU_-G,203,r-ACGU-_,202,l-ACGU-_,197,r-ACGU-_,196,l-ACGU-_,191,r-AGU-C_,178,l-ACGU-_,177,l-AGU-C-_,176,l-CGU-A_,229,r-AG-C-U-_,228,l-AG-C-U-_,227,r-AG-CU-_,225,r-AG-CU-_,224,l-A_-GU-C,223,r-ACU-G_,262,r-AC-GU-_,261,l-AG-CU-_,260,r-AG-CU-_,259,r-AG-CU-_,258,l-AG-CU-_,256,r-ACGU-_,255,l-ACGU-_
-
RF00017
-
hmm (compacted-type)
-
hmm (expanded-type)
-
store-pair,13,l-CU_-AG,33,l-AG-CU-_,193,r-CU_-AG,192,l-ACU-G_,235,r-AGU-C_
-
store-pair,13,l-ACU-G-_,12,r-AU-C-G-_,33,l-AG-CU-_,32,l-AGU-C-_,31,l-AGU-C-_,193,r-CU_-AG,192,l-ACU-G_,191,l-ACU-G_,190,r-AG-CU-_,235,l-AU_-C-G,234,r-CGU-A_,233,r-AGU-C_,229,r-AGU_-C
-
sub,176,75
-
RF00023
-
hmm (compact-type)
-
hmm (expanded-type)
-
store-pair,104,r-ACGU-_,103,l-ACGU-_,160,l-ACGU-_,159,l-ACGU-_,199,r-ACGU-_,198,l-ACGU-_,239,r-ACGU-_,238,l-ACGU-_,284,l-ACGU-_,283,r-ACGU-_,302,l-ACGU-_,299,r-AU-CG-_
-
sub,189,100/sub,227,100
-
sub,189,100/sub,227,100/sub,270,100
-
RF00029
-
hmm (expanded-type)
-
store-pair,24,l-A-C-G-U-_,42,l-A-C-G-U-_
-
sub,20,30
-
store-pair,23,l-A-C-G-U-_,24,l-A-C-G-U-_,41,l-A-C-G-U-_,42,l-A-C-G-U-_
-
RF00059
-
hmm (expanded type)
-
store-pair,44,l-A-C-G-U-_,77,l-A-C-G-U-_,116,l-A-C-G-U-_,117,l-A-C-G-U-_
-
sub,113,40
-
store-pair,44,l-A-C-G-U-_,77,l-A-C-G-U-_,115,l-A-C-G-U-_,116,l-A-C-G-U-_,117,l-A-C-G-U-_
-
sub,103,100
-
RF00168
-
hmm (expanded-type)
-
store-pair,56,l-ACGU-_,54,l-AG_-CU,93,l-ACU-G_,92,l-CGU-A-_,110,r-AG-CU-_,109,l-ACU-G_,125,l-ACG-U-_,124,r-AC-GU-_
-
sub,86,40
-
sub,100,65
-
sub,118,70/sub,100,65
-
RF00174
-
store-pair,41,r-CU_-AG,40,r-ACU-G_,79,l-ACU_-G,78,r-AGU_-C,93,l-ACU-G_,92,r-AGU_-C,91,l-ACG-U_,154,r-ACU-G-_,153,r-AGU-C_
-
store-pair,41,l-AG-CU-_,40,r-ACU-G_,39,l-AGU-C_,79,l-ACU-G-_,78,r-AGU-C-_,76,r-ACU-G_,93,l-ACU-G_,92,r-AGU-C-_,91,l-AC-G-U-_,154,r-ACU-G-_,153,r-A-C-G-U-_,152,l-ACU-G_
-
store-pair,41,l-AG-CU-_,40,l-AGU-C-_,39,r-CU-A-G-_,79,l-ACU-G-_,78,r-AGU-C-_,77,l-AG_-CU,76,r-ACU-G_,75,r-CU_-AG,93,r-ACU-G-_,92,r-AGU-C-_,91,r-A-C-G-U-_,154,r-CU-A-G-_,153,l-AC-GU-_,152,r-AG-CU-_,150,r-ACU_-G
-
store-pair,41,r-CU-A-G-_,40,l-AG-C-U-_,39,r-CU-A-G-_,79,l-ACU-G-_,78,r-AGU-C-_,77,r-AU-C_-G,76,r-AU-C_-G,75,r-CU_-AG,93,r-ACU-G-_,92,r-AG-C-U-_,91,r-A-C-G-U-_,154,r-ACU-G-_,153,r-A-C-G-U-_,152,r-AG-CU-_,150,r-ACU_-G,142,l-ACU-G-_
-
sub,133,240/sub,91,40
-
sub,133,240/sub,75,430
-
sub,101,450
-
tRNAscan-SE archaea
-
hmm (expanded-type)
-
store-pair,19,r-CGU-A_,18,l-ACG-U_,37,r-A-C-G-U-_,57,l-ACGU-_,52,l-ACGU-_,70,r-AGU_-C,69,r-AGU-C-_
-
store-pair,19,r-AU-CG-_,18,l-AG-C-U-_,37,l-GU-A-C-_,35,r-AGU-C_,34,r-ACU-G_,57,l-ACGU-_,54,r-ACGU-_,53,l-ACGU-_,52,l-ACGU-_,70,r-AGU_-C,69,r-AGU_-C,68,r-AGU-C-_
-
store-pair,19,r-AU-CG-_,18,l-AG-C-U-_,17,l-AG-C-U-_,37,l-GU-A-C-_,36,l-ACU-G-_,35,r-AGU-C_,34,r-ACU-G_,57,l-ACGU-_,54,r-ACGU-_,53,l-ACGU-_,52,l-ACGU-_,51,l-ACGU-_,70,r-AGU-C-_,69,r-AGU-C-_,68,r-AGU-C-_,67,r-ACU-G_
-
sub,16,87/sub,66,45/sub,51,24
-
sub,33,250/sub,16,87/sub,66,45/sub,51,24
-
tRNAscan-SE eubacterial
-
store-pair,18,l-AG-C_-U,37,r-A_-C-G-U,57,l-ACGU-_,55,l-ACGU-_,70,l-ACU_-G,69,l-ACU_-G
-
store-pair,19,r-ACU_-G,18,l-ACG-U_,17,r-ACU_-G,37,l-A-C-G-U-_,36,l-ACU_-G,56,l-ACGU-_,55,l-ACGU-_,54,r-ACGU-_,70,l-ACU_-G,69,l-ACU_-G,68,l-CU_-AG
-
store-pair,19,r-ACU-G-_,18,l-AG-C_-U,17,r-ACU_-G,37,r-A_-C-G-U,36,l-ACU_-G,35,l-AC-U_-G,56,l-ACGU-_,55,l-ACGU-_,54,r-ACGU-_,53,r-ACGU-_,70,l-ACU_-G,69,l-ACU_-G,68,l-CU_-AG,67,l-AG_-CU
-
store-pair,19,r-ACU_-G,18,l-AC-G-U-_,17,r-ACU_-G,16,l-ACU_-G,37,r-A_-C-G-U,35,l-A_-C-G-U,34,r-A_-CU-G,56,l-ACGU-_,55,l-ACGU-_,54,r-ACGU-_,53,r-ACGU-_,52,l-ACGU-_,70,l-ACU_-G,69,l-ACU_-G,68,l-A-C-G-U-_,67,l-AG_-CU
-
sub,33,56/sub,16,56/sub,51,40
-
sub,33,56/sub,66,63/sub,51,40/sub,16,56
-
tRNAscan-SE eukaryotic nuclear (used for Drosophila, C. elegans, and
human).
-
store-pair,15,r-A-C-G-U-_,34,r-AG-C_-U,32,r-AU_-CG,51,l-ACGU-_,50,r-ACGU-_,67,l-ACU_-G,65,r-AG-U_-C
-
store-pair,16,l-AG_-CU,15,r-A-C-G-U-_,14,r-CU_-A-G,34,r-A_-C-G-U,32,r-A_-C-G-U,31,l-AG_-CU,54,l-ACGU-_,51,l-ACGU-_,50,r-ACGU-_,49,l-ACGU-_,48,r-ACGU-_,67,l-ACU_-G,66,r-AGU_-C,65,l-ACU-G_,64,r-ACU-G_,63,r-AG_-CU
-
sub,63,48/sub,48,32/sub,13,63
-
sub,63,48/sub,30,500/sub,13,63
-
tRNAscan-SE eukaryotic nuclear selenocysteine (this is an easy family that does
not require the techniques presented in this paper)
-
hmm (expanded type)
About the .csv format
This file is a comma-separated file, which is intended to be viewed in Microsoft
Excel, or a similar program. Otherwise, it's a text file, so viewable in
any text editor, although the files are very long, so it's going to be tough to
read. With a simple script (e.g. with Perl), you could convert it to
other formats.
The columns in the .csv file are:
-
(Probably not useful.) The parameters used to run my scan program (with
the caveat that commas are changed to semicolons to follow the .csv format),
which will probably not be helpful to you.
-
(Probably not useful.) The CM file name used in the scan, which basically
just says what the Rfam accession Id is.
-
(Probably not useful.) The name of the genome sequence file that the hit
in. This will be a file that's a part of the the EMBL nucleotide
database, release 76 (since RFAMSEQ, the nucleotide database searched
by Rfam for all families, is basically a subset of EMBL)
-
(Probably not useful.) sequence #: which sequence this is within the EMBL
file in the previous column, starting with 0.
-
The EMBL accession ID of the sequence in which the putative homolog was
found. This accession ID is for EMBL release 76 (which at some point in
the future, will be an old version of EMBL)
-
(Probably not useful.) Whether or not the given EMBL accession ID is a
part of Rfam 5.0 version RFAMSEQ. Always 1 (yes). This field was
useful to me when I had done scans on an old version of RFAMSEQ; now it's not
useful even to me, because it's always 1.
-
A description of the EMBL sequence, generated from the EMBL format by the
sp2fasta program (part of WU-BLAST), which converts EMBL-format sequence files
into FASTA-format files.
-
is reversed? If 0, the homolog is on the forward strand; if 1, the
homolog is on the reverse strand.
-
An ordinal hit #, starting at 0. i.e. simply numbers all the homologs.
-
start nuc. The first nucleotide in the sequence that is part of the
putative homolog. If the homolog is on the reverse strand, start nuc will
be greater than end nuc.
-
end nuc. Last nucleotide of the homolog.
-
score (bits). The logarithmic score assigned to the homolog by the
family-specific covariance model. In general, higher scores are more
likely to be true homologs. (With weasel words: higher scores indicate
sequences that are more similar to the training data for the family.)
-
The Rfam-format homolog IDs for all family members (homologs) in Rfam 5.0 that
have exactly the same nucleotide sequence. The hit ID format is
EMBL-accession/start-nuc,end-nuc.
-
Hit ranges (start-nuc - end-nuc) of all Rfam 5.0 hits that are in the same EMBL
sequence as the given homolog.
-
"new hitness". 0 means the homolog is already in Rfam 5.0. 1 means
that the homolog overlaps (with at least 1 nucleotide) something already in
Rfam, which usually means that the more sensitive search technique may have
found slightly more optimal nucleotide boundaries for the homolog, but
that it's fundamentally a hit in Rfam 5.0. 2 means that the sequence was
not in Rfam 5.0.
-
Biggest overlap (in nucleotides) with a homolog already in Rfam 5.0. An
overlap of 0 generally means that the homolog is not in Rfam 5.0.
Overlaps <10 usually means is close to another homolog, but is novel, and
>10 usually means it's already there.
-
(Probably not useful.) This hit in Rfam format. Gives
this homolog in Rfam homolog ID format. I forget why I put this in.
About the .cmzasha format
The .cmzasha file is the raw output of our software. The information in it
is essentially redundant with the .csv file. Additional information is:
-
the .cmzasha file gives an alignment of each putative homolog to the family's
covariance model in the format of the Infernal
software package
used by Rfam.
-
The end of the .cmzasha gives the filtering fractions for the profile HMM
rigorous filter(s) used. Note that, in contrast to the formalism of the
paper, the filtering fraction in the .cmzasha file is a fraction of the output
of the previous filter (or original database, if no previous filter), where the
paper had the filtering fraction as always relative to the original
database. This affects the cases where both compact- and expanded-type
HMMs were used in serial.
-
The end of the .cmzasha also gives the CPU user time in seconds taken for the
scan.