Sciweavers

218 search results - page 13 / 44
» Crawling for Images on the WWW
Sort
View
WWW
2006
ACM
14 years 10 months ago
Detecting nepotistic links by language model disagreement
In this short note we demonstrate the applicability of hyperlink downweighting by means of language model disagreement. The method filters out hyperlinks with no relevance to the ...
András A. Benczúr, István B&i...
WWW
2006
ACM
14 years 10 months ago
Status of the African Web
As part of the Language Observatory Project [4], we have been crawling all the web space since 2004. We have collected terabytes of data mostly from Asian and African ccTLDs. In t...
Rizza Camus Caminero, Pavol Zavarsky, Yoshiki Mika...
WWW
2005
ACM
14 years 10 months ago
Analyzing online discussion for marketing intelligence
We present a system that gathers and analyzes online discussion as it relates to consumer products. Weblogs and online message boards provide forums that record the voice of the p...
Natalie S. Glance, Matthew Hurst, Kamal Nigam, Mat...
WWW
2005
ACM
14 years 10 months ago
Exploiting the deep web with DynaBot: matching, probing, and ranking
We present the design of Dynabot, a guided Deep Web discovery system. Dynabot's modular architecture supports focused crawling of the Deep Web with an emphasis on matching, p...
Daniel Rocco, James Caverlee, Ling Liu, Terence Cr...
WWW
2009
ACM
14 years 10 months ago
Data quality in web archiving
Web archives preserve the history of Web sites and have high long-term value for media and business analysts. Such archives are maintained by periodically re-crawling entire Web s...
Marc Spaniol, Dimitar Denev, Arturas Mazeika, Gerh...