Chemistry Lab University of Washington Computer Science & Engineering
 Motif Discovery Assessment: Download
  CSE Home   About Us    Search    Contact Info 

 Assessment Home
 Submit Predictions
    You can download any of three complete benchmarks of data sets. The data sets come from four species: human (file names with prefix hm), mouse (file names with prefix mus), Drosophila melanogaster (file names with prefix dm), and Saccharomyces cerevisiae (file names with prefix yst). The three benchmarks contain the same binding sites, but differ from each other in how the sequences outside of the binding sites were constructed:

The locations of the binding sites, as given by the TRANSFAC database, can be found here. The format used in the location file is described here.

  1. The "real" benchmark (file names with suffix r) has the binding sites in their real genomic promoter sequences.
  2. The "generic" benchmark (file names with suffix g) has the binding sites planted in randomly chosen genomic promoter sequences from the same organism.
  3. The "markov" benchmark (file names with suffix m) has the binding sites planted in sequences randomly generated according to a Markov chain of order 3 that was constructed from the promoter sequences of the same organism.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to tompa]