The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a pa...
The web crawler space is often delimited into two general areas: full-web crawling and focused crawling. We present netSifter, a crawler system which integrates features from thes...
Keeping people away from litigious information becomes one of the most important research area in network information security. Indeed, Web filtering is used to prevent access to u...
Both human users and crawlers face the problem of finding good start pages to explore some topic. We show how to assist in qualifying pages as start nodes by link-based ranking al...
In this paper, we describe the lessons we learned in developing AgentBuilder, a commercial system for rapidly creating agents that extract information from web sites. AgentBuilder...