Chemistry Lab University of Washington Computer Science & Engineering
 Motif Discovery Assessment: Evaluating the Results
  CSE Home   About Us    Search    Contact Info 

 Assessment Home
 Previous: Participant Instructions
 Next: Participants
   

The known binding sites in the data sets, according to TRANSFAC, can be found here.

The statistical analyses of the predicted motifs generally follows ideas used in a similar assessment of gene prediction programs:

  • M. Burset and R. Guigo (1996), Evaluation of gene structure prediction programs. Genomics 34: 353-367.

For each data set, the organizers took the predicted locations and the known binding site locations and computed measures of the accuracy of the predictions. Let K be the set of nucleotide positions occupied by known binding sites and P be the set of nucleotide positions occupied by the predicted binding sites. (For instance, if you predicted 10 binding sites each of length 8, P would consist of 80 positions.) For each data set and each participating program, the nucleotide-level measures consist of the number nTP = |K ∩ P| of true positives, the number nFN = |K-P| of false negatives, and the number nFP = |P-K| of false positives.

Similarly, one can calculate analogous measures at the binding site level. Here sTP is the number of known binding sites overlapped by some predicted binding site, sFN is the number of known binding sites not overlapped by any predicted binding site, and sFP is the number of predicted binding sites not overlapped by any known binding site.

At both the nucleotide and binding site level, one can then compute the sensitivity Sn = TP/(TP+FN), positive predictive value PPV = TP/(TP+FP), and other such measures of accuracy. You can read a complete list of the statistics used.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to tompa]