High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. ...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Ranking documents in a selected corpus plays an important role in information retrieval systems. Despite notable advances in this direction, with continuously accumulating text do...
Byung-Hoon Park, Nagiza F. Samatova, Rajesh Munava...
— The extension approach of frequent itemset mining can be applied to discover the relations among documents. Several schemes, i.e., n-gram, stemming, stopword removal and term w...
−Document clustering has become an increasingly important task in analyzing huge numbers of documents distributed among various sites. The challenging aspect is to analyze this e...