Abstract. The ability to discover the topic of a large set of text documents using relevant keyphrases is usually regarded as a very tedious task if done by hand. Automatic keyphra...
Khaled M. Hammouda, Diego N. Matute, Mohamed S. Ka...
Structured link vector model (SLVM) is a recently proposed document representation that takes into account both structural and semantic information for measuring XML document simi...
This paper presents a cluster validation based document clustering algorithm, which is capable of identifying both important feature words and true model order (cluster number). I...
The sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage poses a serious problem to human readers or OCR systems. This pape...
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: the segmentation output, described a...