As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (...
Web communities are web virtual broadcasting spaces where people can freely discuss anything. While such communities function as discussion boards, they have even greater value as...
Mayo Clinic's Enterprise Data Trust is a collection of data from patient care, education, research, and administrative transactional systems, organized to support information...
Christopher G. Chute, Scott A. Beck, Thomas B. Fis...
Semantic heterogeneity of information is a major barrier of information and system interoperability. Defining ontology of data and mapping ontologies among heterogeneous informati...