 Tools for Detection of Short Motifs in DNA Sequences
YMF is a program that detects statistically overrepresented words (motifs) in DNA sequences. The user may specify the characteristics of the motifs to be detected. A motif here is a short string of nucleotides, degenerate symbols, and spacers. 'Motif size' is the number of non-spacer characters in a motif. Spacers ('N's) are constrained to be in the center of the motif. Degenerate symbols allowed in a motif are R (purine - A or G), Y (pyrimidine - C or T), W (A or T), and S (C or G).

Given a set of sequences, YMF does an enumerative search among all motifs that match the specified characteristics, scoring each motif for its significance, and outputs the top several motifs, sorted by their significance. The significance of a motif is measured by the Z-score of its count in the input sequences. If a motif occurs N times in the input sequences, but was expected to occur E times (with standard deviation of S) in random sequences of the same length, generated by a suitable background model), then the Z-score of the motif is (N-E)/S.

YMF constructs a third-order Markov model of the background sequences (e.g., all known promoter sequences) for the organism under study. Such models are already constructed for some model organisms, and can be used by selecting the appropriate organism in the field called 'Organism'. Background models may also be constructed by the user, by following the "Work with your own organism" link at the bottom of the YMF page.

FindExplanators is a program that extracts from the set of significant motifs reported by YMF, a smaller set of "real" motifs. More specifically, given a set of DNA sequences P, and a set of motifs M (such as those reported by YMF), it extracts a subset E of motifs in M, such that given the occurrences of the motifs of E in the sequences P, the remaining motifs in M are not statistically significant.

User-created organisms are stored on the server, and cookies are used to ensure that only the creator of that organism can access the information. However, this information may be removed by the webmaster without notice to the user, in order to meet space constraints on the server.

