A new pattern filtering technique is developed to analyze the genomic sequence in this research based on gap sequences, in which the distance of the same symbol is recorded consecutively as a sequence of integers. Sequence alignment and similarity testing can be performed on a family of gap sequences over selected patterns. The gap sequence offers a new way for sequence structural analysis. The match between the gap sequences is considered as a frame match while a true match requires both frame and stuffing match. Simulation results show that the extension of gap match indicates the corresponding segment extension in the original genomic sequence. Thus, we are able to generalize the conventional alignment and scoring methods in a more adaptive way.
Shih-Chieh Su, Chia H. Yeh, C. C. Jay Kuo