Sciweavers

472 search results - page 10 / 95
» Crawling the Hidden Web
Sort
View
ICDE
2006
IEEE
144views Database» more  ICDE 2006»
14 years 1 months ago
Finding Thai Web Pages in Foreign Web Spaces
While the Web has been increasingly recognized as a culturally valuable social artifact, many nations endeavor to create national Web archives for long term preservation. However, ...
Kulwadee Somboonviwat, Takayuki Tamura, Masaru Kit...
ICDM
2006
IEEE
164views Data Mining» more  ICDM 2006»
14 years 1 months ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Philip Zigoris, Damian Eads, Yi Zhang
SIGIR
2008
ACM
13 years 7 months ago
Compressed collections for simulated crawling
Collections are a fundamental tool for reproducible evaluation of information retrieval techniques. We describe a new method for distributing the document lengths and term counts ...
Alessio Orlandi, Sebastiano Vigna
WWW
2006
ACM
14 years 8 months ago
What's really new on the web?: identifying new pages from a series of unstable web snapshots
Identifying and tracking new information on the Web is important in sociology, marketing, and survey research, since new trends might be apparent in the new information. Such chan...
Masashi Toyoda, Masaru Kitsuregawa