Abstract. Previous researches on advanced representations for document retrieval have shown that statistical state-of-the-art models are not improved by a variety of different ling...
Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utiliz...
The Mixed Raster Content (MRC) ITU document compression standard (T.44) specifies a multilayer decomposition model for compound documents into two contone image layers and a binar...
In this paper, we present the AutoCat system for product classification. AutoCat uses a vector space model, modified to consider product attributes unavailable in traditional docu...
Abstract. We address the problems of pattern matching and approximate pattern matching in the sketching model. We show that it is impossible to compress the text into a small sketc...
Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, ...