Sciweavers

563 search results - page 42 / 113
» Crawling the web for structured documents
Sort
View
WWW
2003
ACM
16 years 4 months ago
The XML web: a first study
Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML w...
Laurent Mignet, Denilson Barbosa, Pierangelo Veltr...
CIDR
2009
129views Algorithms» more  CIDR 2009»
15 years 5 months ago
Extracting and Querying a Comprehensive Web Database
Recent research in domain-independent information extraction holds the promise of an automatically-constructed structured database derived from the Web. A query system based on th...
Michael J. Cafarella
SIGIR
2009
ACM
15 years 10 months ago
Building enriched document representations using aggregated anchor text
It is well known that anchor text plays a critical role in a variety of search tasks performed over hypertextual domains, including enterprise search, wiki search, and web search....
Donald Metzler, Jasmine Novak, Hang Cui, Srihari R...
WWW
2008
ACM
16 years 4 months ago
As we may perceive: finding the boundaries of compound documents on the web
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Pavel Dmitriev
ACSC
2006
IEEE
15 years 10 months ago
Using formal concept analysis with an incremental knowledge acquisition system for web document management
It is necessary to provide a method to store Web information effectively so it can be utilised as a future knowledge resource. A commonly adopted approach is to classify the retri...
Timothy J. Everts, Sung Sik Park, Byeong Ho Kang