Sciweavers

472 search results - page 58 / 95
» Crawling the Hidden Web
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 8 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
ISW
2009
Springer
14 years 2 months ago
Automated Spyware Collection and Analysis
Various online studies on the prevalence of spyware attest overwhelming numbers (up to 80%) of infected home computers. However, the term spyware is ambiguous and can refer to anyt...
Andreas Stamminger, Christopher Kruegel, Giovanni ...
SIGIR
2005
ACM
14 years 1 months ago
Server selection methods in hybrid portal search
The TREC .GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with ...
David Hawking, Paul Thomas
WWW
2008
ACM
14 years 8 months ago
Efficiently finding web services using a clustering semantic approach
Efficiently finding Web services on the Web is a challenging issue in service-oriented computing. Currently, UDDI is a standard for publishing and discovery of Web services, and U...
Jiangang Ma, Yanchun Zhang, Jing He
WWW
2001
ACM
14 years 8 months ago
Effective Web data extraction with standard XML technologies
We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...
Jussi Myllymaki