Sciweavers

ICML
2006
IEEE

Fast and space efficient string kernels using suffix arrays

15 years 1 months ago
Fast and space efficient string kernels using suffix arrays
String kernels which compare the set of all common substrings between two given strings have recently been proposed by Vishwanathan & Smola (2004). Surprisingly, these kernels can be computed in linear time and linear space using annotated suffix trees. Even though, in theory, the suffix tree based algorithm requires O(n) space for an n length string, in practice at least 40n bytes are required ? 20n bytes for storing the suffix tree, and an additional 20n bytes for the annotation. This large memory requirement coupled with poor locality of memory access, inherent due to the use of suffix trees, means that the performance of the suffix tree based algorithm deteriorates on large strings. In this paper, we describe a new linear time yet space efficient and scalable algorithm for computing string kernels, based on suffix arrays. Our algorithm is a) faster and easier to implement, b) on the average requires only 19n bytes of storage, and c) exhibits strong locality of memory access. W...
Choon Hui Teo, S. V. N. Vishwanathan
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2006
Where ICML
Authors Choon Hui Teo, S. V. N. Vishwanathan
Comments (0)