This paper presents the result of an adaptive region growing segmentation technique for color document images using an irregular pyramid structure. The emphasis is in the segmentat...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Current studies on the storage of XML data are focused on either the efficient mapping of XML data onto an existing RDBMS or the development of a native XML storage. Some native X...
Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the ...
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua L...
This paper describes our first large-scale retrieval attempt in TREC-7 using DSIR. DSIR is a vector space based retrieval system in which semantic similarity between words, docume...