A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. W...
The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
This papers studies a special “small” information retrieval problem where user satisfaction only depends on the ordering of documents. We look for a retrieval performance meas...
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particula...
Daniel M. Dunlavy, Dianne P. O'Leary, John M. Conr...
Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long an...
In-Su Kang, Seung-Hoon Na, Jungi Kim, Jong-Hyeok L...
This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization—a “parse-and-trim” approach and a...
David M. Zajic, Bonnie J. Dorr, Jimmy J. Lin, Rich...
In this paper, we propose a multi-strategic matching and merging approach to find correspondences between ontologies based on the syntactic or semantic characteristics and constr...
For European languages, n-gram has proved to be the cost effective alternative to morphological processing during indexing task and it has been studied and analyzed extensively us...
Most knowledge accumulated through scientific discoveries in genomics and related biomedical disciplines is buried in the vast amount of biomedical literature. Since understandin...