Due to their capability for expressing semantics and relationships among data objects, semi-structured documents have become a common way of representing domain knowledge. Compari...
Henry Tan, Tharam S. Dillon, Fedja Hadzic, Elizabe...
Abstract-- Classification hierarchies are trees where links codify the fact that a node lower in the hierarchy contains documents whose contents are more specific than those one le...
A common limitation of many retrieval models, including the recently proposed axiomatic approaches, is that retrieval scores are solely based on exact (i.e., syntactic) matching o...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retri...
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...