Sciweavers

708 search results - page 34 / 142
» Identifying Content Blocks from Web Documents
Sort
View
SAC
2006
ACM
14 years 2 months ago
Template detection for large scale search engines
Templates in web sites hurt search engine retrieval performance, especially in content relevance and link analysis. Current template removal methods suffer from processing speed ...
Liang Chen, Shaozhi Ye, Xing Li
INTERACT
2003
13 years 10 months ago
A Granular Approach to Web Search Result Presentation
: In this paper we propose and evaluate interfaces for presenting the results of web searches. Sentences, taken from the top retrieved documents, are used as fine-grained represent...
Ryen W. White
WWW
2010
ACM
14 years 3 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
WWW
2006
ACM
14 years 9 months ago
Detecting spam web pages through content analysis
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
APWEB
2003
Springer
14 years 9 days ago
Mining "Hidden Phrase" Definitions from the Web
Keyword searching is the most common form of document search on the Web. Many Web publishers manually annotate the META tags and titles of their pages with frequently queried phras...
Hung V. Nguyen, P. Velamuru, Deepak Kolippakkam, H...