    You can download any of three complete benchmarks of data sets. The data sets come from four species: human (file names with prefix hm), mouse (file names with prefix mus), Drosophila melanogaster (file names with prefix dm), and Saccharomyces cerevisiae (file names with prefix yst). The three benchmarks contain the same binding sites, but differ from each other in how the sequences outside of the binding sites were constructed:

The locations of the binding sites, as given by the TRANSFAC database, can be found here. The format used in the location file is described here.

  1. The "real" benchmark (file names with suffix r) has the binding sites in their real genomic promoter sequences.
  2. The "generic" benchmark (file names with suffix g) has the binding sites planted in randomly chosen genomic promoter sequences from the same organism.
  3. The "markov" benchmark (file names with suffix m) has the binding sites planted in sequences randomly generated according to a Markov chain of order 3 that was constructed from the promoter sequences of the same organism.

