: In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs. In this paper, we present a new nonheuristic algorithm, ...
Michael Beckstette, Dirk Strothmann, Robert Homann...
Abstract. In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited size n. This approach, which is used with ...
We present a linear time and space suffix array (SA) construction algorithm called the SA-IS algorithm. The SA-IS algorithm is novel because of the LMS-substrings used for the pro...
We present a very efficient, in terms of space and access speed, data structure for storing huge natural language data sets. The structure is described as LZ (Ziv Lempel) compresse...
We report on a new experimental analysis of high-order entropy-compressed suffix arrays, which retains the theoretical performance of previous work and represents an improvement in...