sequence content at an abstract level and offers novel ways to examine the information contained in them. Our approach is an information theoretic search process which uses pattern matching techniques for processing the sequence data. Preliminary evaluation on the Drosophila genome has resulted in the finding of a number of irregular patterns, including a histone gene cluster. Discovering new patterns is an important problem in both whole- and comparative genomic application domains. It is our intent to use this research as a launch pad towards developing a comprehensive information-theoretic framework for conducting pattern and knowledge discovery on genomic data.
Willard Davis, Ananth Kalyanaraman, Diane J. Cook