This paper presents a potential seed selection algorithm for web crawlers using a gain - share scoring approach. Initially we consider a set of arbitrarily chosen tourism queries. ...
This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and ...
Abstract. Analysis of aggregate and individual Web requests shows that PageRank is a poor predictor of traffic. We use empirical data to characterize properties of Web traffic not ...
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
There is an exploding amount of user-generated content on the Web due to the emergence of "Web 2.0" services, such as Blogger, MySpace, Flickr, and del.icio.us. The part...
Ka Cheung Sia, Junghoo Cho, Yun Chi, Belle L. Tsen...