Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "fa...
We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a documentcentric approach to decide whether a posting for a given term shoul...
Classification algorithms and document representation approaches are two key elements for a successful document classification system. In the past, much work has been conducted to...
A model based approach for rectifying the camera image of the bound document has been developed, i.e., the surface of the document is represented by a general cylindrical surface....
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...