In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
This paper addresses the problem of extending an adaptive information filtering system to make decisions about the novelty and redundancy of relevant documents. It argues that rel...
TeNDaX is a collaborative database-based real-time editor system. TeNDaX is a new approach for word-processing in which documents (i.e. content and structure, tables, images etc.) ...
In this paper, we introduce the concept of "user policies" and its applications to the browsing of HTML documents. The objective of policies is to specify user preferenc...
This paper presents a novel approach for designing a semi-automatic adaptive OCR for large document image collections in digital libraries. We describe an interactive system for co...
Sachin Rawat, K. S. Sesh Kumar, Million Meshesha, ...