Sciweavers

472 search results - page 35 / 95
» Crawling the Hidden Web
Sort
View
WISE
2005
Springer
14 years 1 months ago
Constructing Interface Schemas for Search Interfaces of Web Databases
Many databases have become Web-accessible through form-based search interfaces (i.e., search forms) that allow users to specify complex and precise queries to access the underlying...
Hai He, Weiyi Meng, Clement T. Yu, Zonghuan Wu
ACSW
2004
13 years 9 months ago
Discovering Parallel Text from the World Wide Web
Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and mul...
Jisong Chen, Rowena Chau, Chung-Hsing Yeh
IADIS
2003
13 years 9 months ago
SPLAT: A System for Self-Plagiarism Detection
This paper presents a system for self-plagiarism detection, SPLAT. The system uses a WebL web spider that crawls through the web sites of the top fifty Computer Science department...
Christian S. Collberg, Stephen G. Kobourov, Joshua...
CLEF
2005
Springer
14 years 1 months ago
EuroGOV: Engineering a Multilingual Web Corpus
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
Börkur Sigurbjörnsson, Jaap Kamps, Maart...
WWW
2010
ACM
14 years 2 months ago
Not so creepy crawler: easy crawler generation with standard xml queries
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...
Franziska von dem Bussche, Klara A. Weiand, Benedi...