|
CSE Home |
About Us |
Search |
Contact Info |
|
The known binding sites in the data sets, according to TRANSFAC, can be found here. The statistical analyses of the predicted motifs generally follows ideas used in a similar assessment of gene prediction programs:
For each data set, the organizers took the predicted locations and the known binding site locations and computed measures of the accuracy of the predictions. Let K be the set of nucleotide positions occupied by known binding sites and P be the set of nucleotide positions occupied by the predicted binding sites. (For instance, if you predicted 10 binding sites each of length 8, P would consist of 80 positions.) For each data set and each participating program, the nucleotide-level measures consist of the number nTP = |K ∩ P| of true positives, the number nFN = |K-P| of false negatives, and the number nFP = |P-K| of false positives. Similarly, one can calculate analogous measures at the binding site level. Here sTP is the number of known binding sites overlapped by some predicted binding site, sFN is the number of known binding sites not overlapped by any predicted binding site, and sFP is the number of predicted binding sites not overlapped by any known binding site. At both the nucleotide and binding site level, one can then compute the sensitivity Sn = TP/(TP+FN), positive predictive value PPV = TP/(TP+FP), and other such measures of accuracy. You can read a complete list of the statistics used. |
|
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to tompa] | |