Abstract— We propose a new method for collecting information on regulatory elements found by any motif discovery program. We suggest that combining the results of n leave-oneout motif discovery runs provides additional information. By examining motifs found in n − 1 of the sequences and scoring them on the remaining sequence, we overcome some of the issues arising from noisy data to identify more high-quality motifs. We describe preliminary investigations of this approach, using MEME for motif discovery. We show that the Leave-oneout method highlights different motifs than a single MEME run would. We demonstrate that our method increases the power of small datasets. We also explore how the information gain of the method changes as the number of sequences increases. Our approach may be generalized to any number of sequences, and may be applied with any motif-inference package that generates a final population of solutions and scores.
Audrey Girouard, Noah W. Smith, Donna K. Slonim