The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Ranking documents in a selected corpus plays an important role in information retrieval systems. Despite notable advances in this direction, with continuously accumulating text do...
Byung-Hoon Park, Nagiza F. Samatova, Rajesh Munava...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly, and the need for flexible, sophisticated document manipulation tools is growi...
Floriana Esposito, Stefano Ferilli, Teresa Maria A...
This paper discusses a new type of semi-supervised document clustering that uses partial supervision to partition a large set of documents. Most clustering methods organizes docum...
An information retrieval IR engine can rank documents based on textual proximityof keywords within each document. In this paper we apply this notion to search across an entire dat...
Roy Goldman, Narayanan Shivakumar, Suresh Venkatas...