This paper introduces a statistical model for query-relevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the pro...
Relying on the idea that back-of-the-book indexes are traditional devices for navigation through large documents, we have developed a method to build a hypertextual network that h...
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising s...
This paper describes a document retrieval system called CAIRN that uses a case-based reasoning set using a large lexicon to automatically generate a case index to that document se...
Performance evaluation of document recognition systems is a difficult and practically important problem. Issues arise in defining requirements, in characterizing the system's...