Sciweavers

472 search results - page 76 / 95
» Crawling the Hidden Web
Sort
View
GFKL
2005
Springer
125views Data Mining» more  GFKL 2005»
14 years 2 months ago
Towards Structure-sensitive Hypertext Categorization
Abstract. Hypertext categorization is the task of automatically assigning category labels to hypertext units. Comparable to text categorization it stays in the area of function lea...
Alexander Mehler, Rüdiger Gleim, Matthias Deh...
WEBDB
2000
Springer
131views Database» more  WEBDB 2000»
14 years 16 days ago
Automatic Classification of Text Databases Through Query Probing
Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the conte...
Panagiotis G. Ipeirotis, Luis Gravano, Mehran Saha...
BNCOD
2003
139views Database» more  BNCOD 2003»
13 years 10 months ago
An Overview about the DynaQuest Framework
The DynaQuest-Framework should reduce the effort for the creation of Internet-based virtual databases. These virtual databases are a special kind of federated database systems whe...
Marco Grawunder
WWW
2005
ACM
14 years 9 months ago
Fully automatic wrapper generation for search engines
When a query is submitted to a search engine, the search engine returns a dynamically generated result page containing the result records, each of which usually consists of a link...
Hongkun Zhao, Weiyi Meng, Zonghuan Wu, Vijay Ragha...
WEBI
2007
Springer
14 years 3 months ago
Determining Bias to Search Engines from Robots.txt
Search engines largely rely on robots (i.e., crawlers or spiders) to collect information from the Web. Such crawling activities can be regulated from the server side by deploying ...
Yang Sun, Ziming Zhuang, Isaac G. Councill, C. Lee...