Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propos...
Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine t...
Many users need to refer to content in existing files (pictures, tables, emails, web pages and etc.) when they write documents(programs, presentations, proposals and etc.), and o...
This paper explores scientific metrics in citation networks in scientific communities, how they differ in ranking papers and authors, and why. In particular we focus on network eff...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...