The use of genetic programming for probabilistic pattern matching is investigated. A stochastic regular expression language is used. The language features a statistically sound semantics, as well as a syntax that promotes efficient manipulation by genetic programming operators. An algorithm for efficient string recognition based on approaches in conventional regular language recognition is used. When attempting to recognize a particular test string, the recognition algorithm computes the probabilities of generating that string and all its prefixes with the given stochastic regular expression. To promote efficiency, intermediate computed probabilities that exceed a given cut-off value will pre-empt particular interpretation paths, and hence prune unconstructive interpretation. A few experiments in recognizing stochastic regular languages are discussed. Application of the technology in bioinformatics is in progress.
Brian J. Ross