Templates in web sites hurt search engine retrieval performance, especially in content relevance and link analysis. Current template removal methods suffer from processing speed ...
: In this paper we propose and evaluate interfaces for presenting the results of web searches. Sentences, taken from the top retrieved documents, are used as fine-grained represent...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
Keyword searching is the most common form of document search on the Web. Many Web publishers manually annotate the META tags and titles of their pages with frequently queried phras...
Hung V. Nguyen, P. Velamuru, Deepak Kolippakkam, H...