This paper describes a simple clustering approach to person name disambiguation of retrieved documents. The methods are based on standard IR concepts and do not require any task-s...
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two ...
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, ...
Documents in many corpora, such as digital libraries and webpages, contain both content and link information. To explicitly consider the document relations represented by links, i...
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
In the context of the NAVIDOMASS project, the problematic of this paper concerns the clustering of historical document images. We propose a structural-based framework to handle the...