This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
The web graph follows the power law distribution and has a hierarchy structure. But neither the PageRank algorithm nor any of its improvements leverage these attributes. In this p...
Yizhou Lu, Benyu Zhang, Wensi Xi, Zheng Chen, Yi L...
We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as we...
Jacob Abernethy, Olivier Chapelle, Carlos Castillo
The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative "impor...
The Web is a very large social network. It is important and interesting to understand the “ecology” of the Web: the general relations of Web pages to their environment. The un...