A wealth of information is available only in web pages, patents, publications etc. Extracting information from such sources is challenging, both due to the typically complex langu...
—We present an intelligent agent crawler designed to collect user-generated content in Second Life and related virtual worlds. The agents navigate autonomously through the world ...
Understanding query reformulation patterns is a key step towards next generation web search engines: it can help improving users’ web-search experience by predicting their inten...
Paolo Boldi, Francesco Bonchi, Carlos Castillo, Se...
Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propos...
Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine t...