Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
Online service providers are engaged in constant conflict with miscreants who try to siphon a portion of legitimate traffic to make illicit profits. We study the abuse of “tr...
Tyler Moore, Nektarios Leontiadis, Nicolas Christi...
We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink ...
Barbara Poblete, Carlos Castillo, Aristides Gionis
Despite of the popularity of global search engines, people still suffer from low accuracy of site search. The primary reason lies in the difference of link structures and data sca...