Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been create...
Given a set of strings U = {T1, T2, . . . , T }, the longest common repeat problem is to find the longest common substring that appears at least twice in each string of U, conside...
Over the last decade the cost of producing genomic sequences has dropped dramatically due to the current so called “next-gen” sequencing methods. However, these next-gen seque...
—For historical documents, available transcriptions typically are inaccurate when compared with the scanned document images. Not only the position of the words and sentences are ...
In this paper we describe a new class of representations for realvalued parameters called Center of Mass Encoding (CoME). CoME is based on variable length strings, it is self-adap...