Current crawler-based search engines usually return a long list of search results containing a lot of noise documents. By indexing collected documents on topic path in taxonomy, t...
Abstract. In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We...
Web spam detection has become one of the top challenges for the Internet search industry. Instead of using some heuristic rules, we propose a feature re-extraction strategy to opt...
Activities such as Web Services and the Semantic Web are working to create a web of distributed machine understandable data. In this paper we present an application called Semanti...
We introduce a technique for creating novel, textuallyenhanced thumbnails of Web pages. These thumbnails combine the advantages of image thumbnails and text summaries to provide c...
Allison Woodruff, Andrew Faulring, Ruth Rosenholtz...