We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth netw...
Berthier A. Ribeiro-Neto, Joao Paulo Kitajima, Gon...
—The Burrows-Wheeler Transform (BWT) is the basis for many of the most effective compression and selfindexing methods used today. A key to the versatility of the BWT is the abili...
Matthias Petri, Gonzalo Navarro, J. Shane Culpeppe...
Stemming is a technique which aims to extract common suffixes of words. Thus, words which are literally differhave a common stem, may be abstracted by their common stem. The under...
The WEBSOM methodology for building very large text archives has a very slow method for extracting meaningful unit labels. This is because the method computes for the relative fre...
Arnulfo P. Azcarraga, Teddy N. Yap Jr., Tat-Seng C...
Abstract. We address the problems of pattern matching and approximate pattern matching in the sketching model. We show that it is impossible to compress the text into a small sketc...
Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, ...