Because of the complexity of documents and the variety of applications which must be supported, document understanding requires the integration of image understanding with text un...
Suzanne Liebowitz Taylor, Deborah A. Dahl, Mark Li...
Many documents are available to a computer only as images from paper. However, most natural language processing systems expect their input as character-coded text, which may be di...
Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns wi...
The task in text retrieval is to find the subset of a collection of documents relevant to a user's information request, usually expressed as a set of words. Classically, docu...
: In information retrieval, data fusion is a technique for combining the outputs of more than one retrieval strategy which rank documents for retrieval. One of the observations oft...
A common authoring technique involves making annotations on a printed draft and then typing the corrections into a computer at a later date. In this paper, we describe a system th...
Extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. An ideal summarization system would unders...
Michele Banko, Vibhu O. Mittal, Michael J. Witbroc...
A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, ...
Keyword-based web query languages suffer from a lack of precision when searching for a precise kind of documents. Indeed, some documents cannot be simply characterized by a list o...
Abstract. Ontology languages are being proposed to provide machine-understandable descriptions of resources that permit easy location of these resource. Content managers can also b...