Sciweavers

52 search results - page 5 / 11
» Finding near-duplicate web pages: a large-scale evaluation o...
Sort
View
WEBI
2005
Springer
14 years 28 days ago
Standardized Evaluation Method for Web Clustering Results
Finding a set of web pages relevant to a user’s information goal is difficult due to the enormous size of the Internet. Search engines are able to find a set of pages that mat...
Daniel Crabtree, Xiaoying Gao, Peter Andreae
SIGIR
2005
ACM
14 years 1 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...
PAKDD
2009
ACM
120views Data Mining» more  PAKDD 2009»
14 years 4 months ago
Detecting Link Hijacking by Web Spammers.
Abstract. Since current search engines employ link-based ranking algorithms as an important tool to decide a ranking of sites, Web spammers are making a significant effort to man...
Masaru Kitsuregawa, Masashi Toyoda, Young-joo Chun...
WWW
2003
ACM
14 years 8 months ago
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. ...
Stephen Dill, Nadav Eiron, David Gibson, Daniel Gr...
CIKM
2010
Springer
13 years 5 months ago
Wisdom of the ages: toward delivering the children's web with the link-based agerank algorithm
Though children frequently use web search engines to learn, interact, and be entertained, modern web search engines are poorly suited to children's needs, requiring relativel...
Karl Gyllstrom, Marie-Francine Moens