Recognition and encoding of digitized historical documents is still a challenging and difficult task. A major problem is the occurrence of unknown glyphs and symbols which might n...
This paper describes a simple clustering approach to person name disambiguation of retrieved documents. The methods are based on standard IR concepts and do not require any task-s...
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...
We present an approach to document clustering based on winnowing fingerprints that achieved good values of effectiveness with considerable save in memory space and computation tim...
We consider the problem of retrieving multiple documents relevant to the single subtopics of a given web query, termed “full-subtopic retrieval”. To solve this problem we pres...
Andrea Bernardini, Claudio Carpineto, Massimiliano...