>program name QuickScore >data set dm01 >parameters P(A) =.2906573694 P(C) =.2124442017 P(G) =.2103059038 P(T) =.2865925251 p(AA) = .3503935439 p(A,C) = .1829443281 p(A,G) = .1886272511 p(A,T) = .2780348771 p(C,A) = .3217899577 p(C,C) = .2174287868 p(C,G) = .2084306830 p(C,T) = .2523505723 p(G,A) = .2644833721 p(G,C) = .2719299283 p(G,G) = .2117875468 p(2,T) = .2517991530 p(T,A) = .2264065301 p(T,C) = .1951177435 p(T,G) = .2325778654 P(T,T) = .3458978609 >preprocessing description >postprocessing description >data set dm02 >parameters >preprocessing description >postprocessing description >data set dm03 >parameters P(A) =.2906573694 P(C) =.2124442017 P(G) =.2103059038 P(T) =.2865925251 p(AA) = .3503935439 p(A,C) = .1829443281 p(A,G) = .1886272511 p(A,T) = .2780348771 p(C,A) = .3217899577 p(C,C) = .2174287868 p(C,G) = .2084306830 p(C,T) = .2523505723 p(G,A) = .2644833721 p(G,C) = .2719299283 p(G,G) = .2117875468 p(2,T) = .2517991530 p(T,A) = .2264065301 p(T,C) = .1951177435 p(T,G) = .2325778654 P(T,T) = .3458978609 >preprocessing description >postprocessing description >data set dm04 >parameters P(A) =.2906573694 P(C) =.2124442017 P(G) =.2103059038 P(T) =.2865925251 p(AA) = .3503935439 p(A,C) = .1829443281 p(A,G) = .1886272511 p(A,T) = .2780348771 p(C,A) = .3217899577 p(C,C) = .2174287868 p(C,G) = .2084306830 p(C,T) = .2523505723 p(G,A) = .2644833721 p(G,C) = .2719299283 p(G,G) = .2117875468 p(2,T) = .2517991530 p(T,A) = .2264065301 p(T,C) = .1951177435 p(T,G) = .2325778654 P(T,T) = .3458978609 >preprocessing Threshold on the number of occurrences on both strands: 14 Threshold on p-value: .0004 First choice = .000466439149 (TATAAAT) >postprocessing Two groups with pvalue above 0020 or p-value in the range .0004 to .0005. The first group is CG(CA)^n: these small repeats are exceptional but are not binding sites. Ths=e first group contains some artefacts of the first group and motifs with a core GAAAA. It turns out that: pvalue(GAAAA) ~ pvalue(NGAAAA) ~pvalue(GAAAAN) although these words NGAAA and GAAAAN are longer. Additionnally, the left and right neighbours seem to be random. We discarded the TATAAT box, with good p-value as a 6-word, description >data set dm05 >parameters P(A) =.2906573694 P(C) =.2124442017 P(G) =.2103059038 P(T) =.2865925251 p(AA) = .3503935439 p(A,C) = .1829443281 p(A,G) = .1886272511 p(A,T) = .2780348771 p(C,A) = .3217899577 p(C,C) = .2174287868 p(C,G) = .2084306830 p(C,T) = .2523505723 p(G,A) = .2644833721 p(G,C) = .2719299283 p(G,G) = .2117875468 p(2,T) = .2517991530 p(T,A) = .2264065301 p(T,C) = .1951177435 p(T,G) = .2325778654 P(T,T) = .3458978609 >preprocessing Threshold on the number of occurrences on both strands: 14 Threshold on p-value: 000317381084 First choice = .001218042570 >postprocessing No good words of size 7. >data set dm07 >parameters P(A) =.2906573694 P(C) =.2124442017 P(G) =.2103059038 P(T) =.2865925251 p(AA) = .3503935439 p(A,C) = .1829443281 p(A,G) = .1886272511 p(A,T) = .2780348771 p(C,A) = .3217899577 p(C,C) = .2174287868 p(C,G) = .2084306830 p(C,T) = .2523505723 p(G,A) = .2644833721 p(G,C) = .2719299283 p(G,G) = .2117875468 p(2,T) = .2517991530 p(T,A) = .2264065301 p(T,C) = .1951177435 p(T,G) = .2325778654 P(T,T) = .3458978609 >preprocessing Threshold on the number of occurrences on both strands: 8 Threshold on p-value: 0008 First choice = .001123554500 >postprocessing discard TAAAAA and TTTTTA : these are artefacts of the numerous polyT. >data set dm08 >parameters P(A) =.2906573694 P(C) =.2124442017 P(G) =.2103059038 P(T) =.2865925251 p(AA) = .3503935439 p(A,C) = .1829443281 p(A,G) = .1886272511 p(A,T) = .2780348771 p(C,A) = .3217899577 p(C,C) = .2174287868 p(C,G) = .2084306830 p(C,T) = .2523505723 p(G,A) = .2644833721 p(G,C) = .2719299283 p(G,G) = .2117875468 p(2,T) = .2517991530 p(T,A) = .2264065301 p(T,C) = .1951177435 p(T,G) = .2325778654 P(T,T) = .3458978609 >preprocessing Threshold on the number of occurrences on both strands: 11 Threshold on p-value: 0008 First choice = .001120046068 4 candidates: ATATAT .0018 TTACTA/TAGTAA 001529027634 AGTAAT/ATTACT .001120046068 TATATA .0011 Two candidates have a core AGTAA/TTACT. There is a dominating extension WAGTAAW, that is preferred to the choice TAGTAA. >postprocessing discard ATATAT .0018 as a TATAbox. >data set yst01 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on the number of occurrences on both strands: 14 Threshold on p-value: .0003 First choice 000328109412 (AACAAA) Discarded poly-T and poly-A motives, Discarded TGTATA, which is an artifact of TATATG, Discarded 3-periodic pattern CTTCTT and its complement Discarded TTGTTT and TTTGTT as overlapping patterns AAGAAA is overlapping with GCAAGA >data set yst02 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on p-value: 0.0020 Discarded TATATA, ATATAT, poly-T and poly-A Word with best p-value: TCGGAG (.003675395645) Discarded CTGGAG as an artefact from TCGGAG TTTTTC (.002195553508) GAAAAA (.002195553508) are artefacts of poly-T and poly-A ATCGGA has a good p-value, but is overlapping with best word (TCGGAG), so discarded. >data set yst03 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on p-value: 0.0013 Word with best p-value: TGGTGC (.002828301245), and its complement, but they occur only in 2 sequences out of 8 sequences in total Word with the 2nd best p-value: TGACTC (.002551911534) and its complement occur in all sequences, except seq nb 3 and 8. All other good-pvalue-words don't occur in the majority of the sequences. Conclusion: only word TGACTC is reported. >data set yst04 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on p-value: .0015 Discarded poly-T and poly-A, TTTTTC and GAAAAA as artefacts of poly-A/T Word with best p-value: TCTTGT (.002615237407) and its complement (ACAAGA): occur in all but sequence number 4 Next words: Discarded 3-periodic word CTTCTT and AGGAGG; discarded TCTTTT, AAAAGA, TTTTCT and AGAAAA: artefacts from poly-T/A; discarded TATGAG (.001774805884) and TAGCTA (.001778244752): occur only in 3 sequences (3,4 and 5) >data set yst05 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on number of occurrences of a word and its complement: 4 Threshold on p-value: .0025 Discarded poly-AT/TA, AAATTT, TAATAA, TTTTCT Word with best p-value: ACATAT (.005626697725) and its complement ATATGT All other word have much lower p-values (<0.0034) >data set yst06 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on number of occurrences of a word and its complement: 4 Threshold on p-value: .0015 Discarded poly-AT/TA, poly-T/A. Words CTTTTT, AAAAAG, TTTTTC, GAAAAA are aretefacts of poly-A/T. Word with best p-value: TTCTCT (.002545592449) and its complement AGAGAA: occur in all sequences All other words are very rich in T nucleotides. >data set yst07 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on number of occurrences of a word and its complement: 4 Threshold on p-value: 0.0015 Word with best p-value: CGGTGG (.002037684164) and its complement CCACCA, but occur only 4 times (in 4 sequences out of 6). Among its subwords, the word CCAC appears 17 times and has a good distribution in the sequences. >data set yst08 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on number of occurrences of a word and its complement: 4 Threshold on p-value: .0010 Discarding polyT/A. TTTTTC, GAAAAA, TTTTCT and AGAAAA are artefacts of polyT/A Word with best p-value is CTTCCT (.001475539220) and its complement AGGAAG (.001475539220) that occur 24 times in the sequences. Most obvious pattern is CACG that occurs 56 times in the sequences. >data set yst09 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on number of occurrences of a word and its complement: 10 Threshold on p-value: .00035 Discarding poly-AT/TA and 2-periodic patterns CTTCCT, CTTCTT Word with best p-value: TTCTTC|GAAGAA (.001100023674) Word with good p-value: TGCTAT|ATAGCA (.000833383503) >data set yst10 >parameters QM["A","A"]:= 0.3573820282; QM["A","C"]:= 0.1714697507; QM["A","G"]:= 0.1875215098; QM["A","T"]:= 0.2836267114; QM["C","A"]:= 0.3288487530; QM["C","C"]:= 0.2001481981; QM["C","G"]:= 0.1629351509; QM["C","T"]:= 0.3080678979; QM["G","A"]:= 0.3167924002; QM["G","C"]:= 0.2096274934; QM["G","G"]:= 0.1990282435; QM["G","T"]:= 0.2745518630; QM["T","A"]:= 0.2535413829; QM["T","C"]:= 0.1939129242; QM["T","G"]:= 0.1940379836; QM["T","T"]:= 0.3585077093; Vstat["A"] := 0.3119841035; Vstat["C"] := 0.1910683665; Vstat["G"] := 0.1870428315; Vstat["T"] := 0.3099046986; >preprocessing >postprocessing Threshold on number of occurrences of a word and its complement: 5 Threshold on p-value: .0010 Word with best p-value: TTTCAT|ATGAAA (.003269330926), occurs 10 times (not in seq nb 1) >data set mus01 >parameters Pstat[A] := 0.2765053183; Psta[C] := 0.2204240307; Pstat[G] := 0.2230384134; Pstat[T] := 0.2800322376; P(A,A):= 0.3014936940; P(A,C):= 0.1911214065; P(A,G):= 0.2725827476; P(A,T):= 0.2348021519; P(C,A):= 0.3335065426; P(C,C):= 0.2660213354; P(C,G):= 0.05887179660; P(C,T):= 0.3416003254; P(G,A):= 0.2789669073; P(G,C):= 0.2110676555; P(G,G):= 0.2659267903; P(G,T):= 0.2440386469; P(T,A):= 0.2049462450; P(T,C):= 0.2210619002; P(T,G):= 0.2692219816; P(T,T):= 0.3047698733; Threshold on the number of occurrences on both strands: 1 Threshold on p-value: .003 First choice = AAGCCA .003041020156 (AAGCCA/TGGCTT) Best choice .004991301512 (CCACAG/CTGTGG) >preprocessing description >postprocessing The first choice occurs in all sequences. The second choice .003765413255 for (GAGGGA/TCCCTC) has a much smaller rate function. The first choice is kept. >data set mus02 >parameters Threshold on the number of occurrences on both strands: 7 Threshold on p-value: .00119 First choice = .001410936638 (AATAGC/GCTATT) >preprocessing description >postprocessing Best choice .001862135504 -> .002961887514 polyT, polyA and 2-repeats discarded Next three are very closed: AATAGC/GCTATT .001410936638 AGACCA/TGGTCT .001415089814 AAGCAT/ATGCTT .001452904339 Numbers 2 and 3 occur in most of the sequences. We assume a coregulation by 2 TF on 2 BS AAGCAT/ATGCTT and AGACCA/TGGTCT >data set mus03 >parameters Threshold on the number of occurrences on both strands: 3 Threshold on p-value: .001794765601 First choice = .001947256165 (ACGCCC/GGGCGT) Best choice: .002763305584 (CGGCTC/GAGCCG) >preprocessing description >postprocessing Next three are very closed: CCGCGG .0026 CGCCCC/GGGGCG .002698541287 CAGCTC/GAGCTG .002641491073 CCGCGG and CGCCCC/GGGGCG appear inside GC sequences. CAGCTC/GAGCTG appear in many sequences, and it is very close to the first choice. We assume a degenerate site CAGCYG/CRGCTC >data set mus04 >parameters >data set mus05 >parameters Threshold on the number of occurrences on both strands: 3 Threshold on p-value: .0028 First choice = .004087695494 (ATTTTC/GAAAAT) Best choice: .004552700488 (CCCCGC/GCGGGG) >preprocessing description >postprocessing Three firsty scores are every closed, and far from the next ones. But the two first scores : (ACGCCC/GGGCGT) .004210985257 (CCCCGC/GCGGGG) .004552700488 occur only in sequence 2, while (ATTTTC/GAAAAT) occur in 3 sequences (0,1,3). Additionnally, (ACGCCC/GGGCGT) and (CCCCGC/GCGGGG) look like parts of CG-sequences. (ATTTTC/GAAAAT) is the best. >data set mus06 >parameters Threshold on the number of occurrences on both strands: 2 Threshold on p-value: .0039 First choice = .003936072191 (CCCTCC/GGAGGG) Best choice: .007549220593 (CCCGGC/GCCGGG) >preprocessing description >postprocessing Best scoring motif occurs only in 4 positions in 2 sequences. Three next scores are very closed. GACAGA TCTGTC .004107410899 ACGCGG CCGCGT .004236425053 CTTATC GATAAG .004723258205 Nevertheless, ACGCGG CCGCGT occur only in 2 positions in 2 sequences. GACAGA TCTGTC and CTTATC GATAAG are alike. >data set mus07 >parameters Threshold on the number of occurrences on both strands: 10 Threshold on p-value: .0027 First choice = .003936072191 (GCCGCC/GGCGGC) Best choice:.007581339477 (CCCCGC/GCGGGG) >preprocessing description >postprocessing All best scoring motifs occurs in overlapping positions in CG islands. One discards poly-Ga or poly-TC. Among the next best scores, CCGAG/CTCGG appears as prefix and suffix. Its p-value as a 5-mer is .2787271488e-2 which is comparable to the next best score on 6-mers, .003950931836, for (CCCTCC/GGAGGG) The overall distribution on sequences is better for CCGAG/CTCGG as CCCTCC/GGAGGG appears mainly in the first sequence. >data set mus08 >parameters Threshold on the number of occurrences on both strands: 5 Threshold on p-value: .0020 First choice = .002015167479 (CACGGG/CCCGTG) Best choice:.007581339477 (CCCCGC/GCGGGG) >preprocessing description >postprocessing All best scoring motifs occurs in overlapping positions in CG islands. One discards poly-Ga or poly-TC. Among the next best scores, CCGAG/CTCGG appears as prefix and suffix. Its p-value as a 5-mer is .2787271488e-2 which is comparable to the next best score on 6-mers, .003950931836, for (CCCTCC/GGAGGG) The overall distribution on sequences is better for CCGAG/CTCGG as CCCTCC/GGAGGG appears mainly in the first sequence. >data set mus10 >parameters Threshold on the number of occurrences on both strands: 10 Threshold on p-value: .0010 First choice = .001006847096 (ACTCCG/ CGGAGT) Best choice:.004286489985 (CCGCCC/GGGCGG) >preprocessing description >postprocessing Best scoring motifs are CG islands and mutual artefacts. Poly AC/GT are discarded. 5-mer CGGAG/CTCCG appear as a suffix and prefix of next scoring 6-mers. Its pvalue is .1836829466e-2 that is as good as the 3-rd scor efor 6-mers. It is chosen. >data set hm02 >parameters P(A,A)= 0.3136132381; P(A,C)= 0.1812953330; P(A,G)= 0.2632829234; P(A,T)= 0.2418085054; P(C,A)= 0.3250358783; P(C,C)= 0.2799258801; P(C,G)= 0.07003139343; P(C,T)= 0.3250068482; P(G,A)= 0.2678423659; P(G,C)= 0.2258896136; P(G,G)= 0.2798820365; P(G,T)= 0.2263859840; P(T,A)= 0.2013581011; P(T,C)= 0.2163578741; P(T,G)= 0.2650744616; P(T,T)= 0.3172095632; P(A) = 0.2747031754; P(C) = 0.2230212026; P(G) = 0.2244012540; P(T) = 0.2778743681; Threshold on the number of occurrences on both strands: 9 Threshold on p-value: .0020 Best choice: .004275006126 for (CCTCCC/GGGAGG) >preprocessing description >postprocessing >data set hm03 >parameters Threshold on the number of occurrences on both strands: 14 Threshold on p-value: .00080 Best choice: .003668461190 for (AGGGAA/TTCCCT) >preprocessing description >postprocessing None. Comment: this 6-mer outperforms other 6-mers. >data set >data set hm04 >parameters >preprocessing >postprocessing Threshold for number of occurrences of a motif and its complement: 13 Threshold for p-value: .0023 Word with best p-value is GCGCGG|CCGCGC (.005965442856) Words with bood p-value: GGCGGC|GCCGCC (.004910341349) Other words are overlapping (CGGCGG, GCGGCC, ...) First non-GC rich motif is GCGGAG|CTCCGC (.003596741733) >data set hm05 >parameters Threshold on the number of occurrences on both strands: 6 Threshold on p-value.005252184215 Best choice: .008778758525 for (CGCGGC/GCCGCG) >preprocessing description >postprocessing Discard CG islands Second score is .006901060255 for (CACCGC/GCGGTG) First sequence does not contain CACCGC/GCGGT but contains CGGTG the p-value of (CACCG/CGGTG) is .3906793498e-2. This motif gives a better repartition. >data set hm06 >parameters Threshold on the number of occurrences on both strands: 6 Threshold on p-value: .004 Best choice: .010180688860 for (CGGCGC/GCGCCG) >preprocessing description >postprocessing Discard CG islands (they occur in few sequences) Second score is 004849406094 for (CCTCCC/GGGAGG) that has a good sequence repartition. >data set hm07 >parameters >preprocessing >postprocessing Threshold for number of occurrences of a motif and its complement: 3 Threshold for p-value: .0020 Discarding GC-rich patterns. First word (non-GC rich): CTCCCG|CGGGAG (.004548403809) Second: CCGCTG|CAGCGG (.003118281945) >data set hm08 >parameters >preprocessing >postprocessing First motif which is not GC-rich: GCGGAG|CTCCGC All other motifs are GC-rich. >data set hm09 >parameters Threshold on the number of occurrences on both strands: 14 Threshold on p-value: .001 Best choice: .005245488424 for (CCGCCC/GGGCGG) >preprocessing description >postprocessing Discard CG islands. Second score is .003149883687 for (CTCCAC GTGGAG) that has a good sequence repartition. >data set hm13 >parameters Threshold on the number of occurrences on both strands: 6 Threshold on p-value: .001 Best choice: .004329715076 for (ACAGGA/TCCTGT) >preprocessing description >postprocessing None. Comment: next score is only .001490985511 for (GCAAAC/GTTTG) >data set hm15>parameters Threshold on the number of occurrences on both strands: 6 Threshold on p-value: .001 Best choice: 007914009138 for (CCCTCC/GGAGGG) >preprocessing description >postprocessing None >data set hm21 >parameters >preprocessing >postprocessing Threshold for number of occurrences of a motif and its complement: 4 Threshold for p-value: .0025 Best p-value for non-GC word GCGGAG|CTCCGC (.003354080175) Next word GCGAGG|CCTCGC (.003352389843) Next word (GTGGGG) is artefact from poly-G/C. >data set hm20 >parameters >preprocessing >postprocessing Considering words of size 8. Threshold for number of occurrences of a motif and its complement: 8 Threshold for p-value: Discarded poly-T/A, poly-TA/AT. Word with best p-value: CTGTAATC|GATTACAG (.1812774404e-2), which is surrounded as a 10-word by C (left and right extension), resp. a G (left and right ext.) Word AGGCTGAG|CTCAGCCT (.9378508417e-3) contains TCAGCC which appears in other good p-valued words.