>program name oligo/dyad-analyis >data set yst01 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing >data set yst02 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing >data set yst03 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing We selected the words assembled in the most significant pattern cluster (GAGTCA, GAGTCAA, AGTCAT), and set the threshold to 0.74, which is the largest pattern of this cluster (heptanucleotide), and encompasses the two other ones. This threshold was selected to filter out AGTCAT, which probably reflects the flanking preferences of the motif, but does not contain the most significant core (GAGTCA). >data set yst04 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing We selected the most significant pattern >data set yst05 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing If we have to select one, it's the octanucleotide cccacaga. Its sig is low, but give a higher sig with the binomial model (0.81), and there are few sequences (3) >data set yst06 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing The three most significant patterns assemble in all cases but one in 1 nonanucleotide. One occurence of attagga is isolated, and has been retained because the pattern is quite big (7 nt) and could be the core of the binding site cluster (aattagga, attagga, attaggaa). >data set yst07 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing >data set yst08 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing In this case, we found 2 clusters which are susceptible to be regulatory sequences : In the first one, the hexanucleotide aggaag seem not to be sufficient : There are a lot of occurences in some sequences. On the contrary, the heptanucleotide taggaag can be found in 7/11 sequences, with 2 occurences in 1 sequence only. In the second, the octanucleotide gccagaaa seem to be a good candidate (big motif,1 occurence in 7/11 sequences) the other words which are part of this cluster are discarded because they are essentially polyA, and have only a small part overlapping with the octanucleotide >data set yst09 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing Nothing convincing >data set yst10 >parameters -v 1 -thosig 0 -purge -org Saccharomyces_cerevisiae -seq Saccharomyces_cerevisiae_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Saccharomyces_cerevisiae/rand_gene_selections >postprocessing The only pattern with an acceptable sig is atatag, but it's an hexanucleotide, and its distribution amongst the sequences (6 occurences in one sequence) seems bad. >data set dm01 >parameters -v 1 -thosig 0 -purge -org Drosophila_melanogaster -seq Drosophila_melanogaster_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/rand_gene_selections >postprocessing For the most significant cluster of words, we only selected the heptanucleotide ACAGCAA, because hexanucleotides are likely to return false positions. We also choose the 2 octanucleotides CACGGCCA and CGCTTTAC which have a reasonably good significance. >data set dm02 >parameters -v 1 -thosig 0 -purge -org Drosophila_melanogaster -seq Drosophila_melanogaster_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/rand_gene_selections >postprocessing This is a very delicate case because there is only one sequence. The three oligonucletides have a quite low sig There are however 2 dyads cggn{8}ggc and cggn{9}gcg which have a high sig (3.03 and 2.04), and which can be assembled to form CGGn{8}GGCG. This site seems reasonable (3 occurences). >data set dm03 >parameters -v 1 -thosig 0 -purge -org Drosophila_melanogaster -seq Drosophila_melanogaster_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/rand_gene_selections >postprocessing >data set dm04 >parameters -v 1 -thosig 0 -purge -org Drosophila_melanogaster -seq Drosophila_melanogaster_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/rand_gene_selections >postprocessing >data set dm05 >parameters -v 1 -thosig 0 -purge -org Drosophila_melanogaster -seq Drosophila_melanogaster_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/rand_gene_selections >postprocessing We selected the 2 most significant patterns, because they has a reasonable repartition (occurences in the 3 sequences). >data set dm06 >parameters -v 1 -thosig 0 -purge -org Drosophila_melanogaster -seq Drosophila_melanogaster_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/rand_gene_selections >postprocessing >data set dm07 >parameters -v 1 -thosig 0 -purge -org Drosophila_melanogaster -seq Drosophila_melanogaster_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/rand_gene_selections >postprocessing The most significant pattern was chosen; its distribution seems reasonable. >data set dm08 >parameters -v 1 -thosig 0 -purge -org Drosophila_melanogaster -seq Drosophila_melanogaster_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Drosophila_melanogaster/rand_gene_selections >postprocessing After observation of the maps with pentanucleotides and heptanucleotides with sig>=-1, it seems reasonable to think that the hexanucleotide tagtaa is the constant core of a binding site, but is not sufficient (many occurences - 6 in the first sequence, 2 in the second, and 5 in the third). We selected the 3 heptanucleotides forming the assembly made of 5nt + 6nt. >data set mus01 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing The only significant pattern is not convincing (4 occurences in 1 sequence and 0 in the 2 other ones). >data set mus02 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing In most mouse families, there is not a single significant pattern, when the threshold of sig >=0 is selected. This suggests that our calibration might be too stringent. We lowered the threshold to -1 to evaluate slightly less significant patterns. With mus02, we selected the most significant pattern, which is detected as heptanucleotide TCATGCA (sig=-0.37) and as hexanucleotide (CATGCA). >data set mus03 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing In most mouse families, there is not a single significant pattern, when the threshold of sig >=0 is selected. This suggests that our calibration might be too stringent. We lowered the threshold to -1 to evaluate slightly less significant patterns. In mus03, a single heptanucleotide had a sig >= -1, we thus retained it. >data set mus04 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing >data set mus05 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing TO DISCUSS >data set mus06 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing >data set mus07 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing In most mouse families, there is not a single significant pattern, when the threshold of sig >=0 is selected. This suggests that our calibration might be too stringent. We lowered the threshold to -1 to evaluate slightly less significant patterns. In mus07, we selected the two most significant heptanucleotides: GAGGCCC and CTCGCTC. Hexanucleotides were not retained because their significance is weaker, and selecting them would result in many matches, likely to be spurious. >data set mus08 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing >data set mus09 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing >data set mus10 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing This pattern has a low sig, but is the only one with sig > -1, and has its occurences have good repartition over the input sequences. >data set mus11 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing >data set mus12 >parameters -v 1 -thosig 0 -purge -org Mus_musculus -seq Mus_musculus_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Mus_musculus/rand_gene_selections >postprocessing >data set hm01 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm02 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing The only significant pattern, CAGCACAC, is weakly significant (sig=0.52) and is only found in 3 out of 9 sequences. >data set hm03 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing The only significant pattern, ATAGTCTG, is weakly significant (sig=0.29), and it is found in 3/10 sequences only. >data set hm04 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections -minol 8 -maxol 8 >postprocessing The analysis of 5nt, 6nt and 7nt detected dozens of patterns covering the whole sequence, suggesting that the calibration is inappropriate for this data set (same as for hm18). We restricted the analysis to octanucleotides, and selected the most significant pattern assembly made of two octanucleotides AAACCCGC (1.52) and AACCCGCA (1.68). The octanucleotide GGCGGAGA was also retained because it is reasonably significnt, and found in most sequences. >data set hm05 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing No significant pattern is detected with sig >=0, probably dur to the small number of sequences. When the threshold is lowered to 0, some patterns are selected. The most significant among these, TAGGAGT, is found in the three sequences, and in 2/4 cases, it is extended by an overlapping pattern GGAGTAT. This could be a reasonable candidate. >data set hm06 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm07 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing A single heptanucleotide is reasonably significant, and is found in 4/5 sequences. >data set hm08 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing We only retained the most significant pattern (TGACGT). >data set hm09 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm10 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing The selected patterns had a very weak significance, and each was found on a small number of sequences only. >data set hm11 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm12 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing Two octanucleotides (AATTGACG and ACGTCATG) were selected with sig>=0, but present only on one sequence (seq_0). To improve sensitivity, we looked at patterns with sig >= -1. Many 6nt, 7nt and 8nt were selected with sig >= -1, which aligned with the core octanucleotide AATTGACG. We selected the oligos of this cluster for pattern matching, which allowed to predict a match on seq_1, in addition to the two matche found in seq_0 with sig >= 0. >data set hm13 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm14 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing Not really convincing, but it's not easy to find overrepresented motifs in a set of 2 sequences only. We think this octanucleotide could be a TF binding site because of the distribution of its occurences. >data set hm15 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm16 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm17 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing All patterns detected align with the cor AATTCCCG (sig= 3.56), and their matching locations overlap. We retain all of them. >data set hm18 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections -minol 7 -maxol 8 >postprocessing For this data set, the analysis of 5nt and 6nt returned too many patterns, which covered the whole sequence. When the analysis is resticted to 8nt only, a single pattern (GCGGAGAA, sig=1.83) is selected, which covers 3 sequences out of 5. When 7nt and 8nt are taken together, the second most significant pattern (ACTGCGC, sig=1.32) is assembled with several other significant patterns. We retained this pattern cluster centered on ACTGCGC, and the significant octanucleotide GCGGAGAA. >data set hm19 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm20 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm21 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing >data set hm22 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing The 2 most significant patterns could be relevant >data set hm23 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing Not really convincing, we simply selected the best match >data set hm24 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing The selected patterns had a very weak significance, and each was found on a small number of sequences only. >data set hm25 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing The only pattern with significance > 0 is not convincing (5 occurences in 1 sequence and 0 in the other one). We then chose the patterns gctata and aaggac with lower sig, but with a good repartition (1 occurence in seq 0, and 2 in seq 1, with similar distance from the start codon) >data set hm26 >parameters -v 1 -thosig 0 -purge -org Homo_sapiens -seq Homo_sapiens_files.txt -outdir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/multi/calibN_bg-purge -2str -minol 6 -maxol 8 -minsp 0 -maxsp 20 -bg calibN -sort score -task report,synthesis -noov -calib_dir /Users/jvanheld/motif_discovery_competition_2003/results/Homo_sapiens/rand_gene_selections >postprocessing We selected the two most significant patterns, which have a sig > 1.