Abstract. There is a common availability of classification terms in online text collections and digital libraries, such as manually assigned keywords or key-phrases from a controll...
Traditional content-based e-mail spam filtering takes into account content of e-mail messages and apply machine learning techniques to infer patterns that discriminate spams from...
Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, ...
The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as "who was the first American in space?"...
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the...