Sciweavers

295 search results - page 41 / 59
» Web Crawling
Sort
View
SIGIR
2008
ACM
13 years 9 months ago
Classifiers without borders: incorporating fielded text from neighboring web pages
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
Xiaoguang Qi, Brian D. Davison
WWW
2007
ACM
14 years 10 months ago
Optimizing web search using social annotations
This paper explores the use of social annotations to improve web search. Nowadays, many services, e.g. del.icio.us, have been developed for web users to organize and share their f...
Shenghua Bao, Gui-Rong Xue, Xiaoyuan Wu, Yong Yu, ...
PAKDD
2009
ACM
116views Data Mining» more  PAKDD 2009»
14 years 4 months ago
Scalable Web Mining with Newistic
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...
Ovidiu Dan, Horatiu Mocian
LAWEB
2003
IEEE
14 years 3 months ago
On the Evolution of Clusters of Near-Duplicate Web Pages
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Dennis Fetterly, Mark Manasse, Marc Najork
WWW
2011
ACM
13 years 4 months ago
Design and implementation of contextual information portals
This paper presents a system for enabling offline web use to satisfy the information needs of disconnected communities. We describe the design, implementation, evaluation, and pil...
Jay Chen, Russell Power, Lakshminarayanan Subraman...