Efficient Computation of Substring Equivalence Classes with Suffix Arrays

15 years 10 months ago

Download www.i.kyushu-u.ac.jp

This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.

Kazuyuki Narisawa, Shunsuke Inenaga, Hideo Bannai,

Real-time Traffic

Combinatorics | Corresponding Suffix Trees | CPM 2007 | Equivalence Classes | Suffix Tree |

claim paper

» Fast and space efficient string kernels using suffix arrays

» The Average Common Substring Approach to Phylogenomic Reconstruction

» Efficient computation of absent words in genomic sequences

» Information Theoretic Approaches to Whole Genome Phylogenies

» Twodimensional interleaving schemes with repetitions Constructions and bounds

Post Info
More Details (n/a)

Added	14 Aug 2010
Updated	14 Aug 2010
Type	Conference
Year	2007
Where	CPM
Authors	Kazuyuki Narisawa, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Comments (0)

Sciweavers

Efficient Computation of Substring Equivalence Classes with Suffix Arrays

Combinatorics | Corresponding Suffix Trees | CPM 2007 | Equivalence Classes | Suffix Tree |

Explore & Download

Productivity Tools

Sciweavers