Sciweavers

2677 search results - page 36 / 536
» Extracting Structured Data from Web Pages
Sort
View
WWW
2003
ACM
14 years 9 months ago
DOM-based content extraction of HTML documents
Web pages often contain clutter (such as pop-up ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction o...
Suhit Gupta, Gail E. Kaiser, David Neistadt, Peter...
CIKM
2005
Springer
14 years 2 months ago
Versatile structural disambiguation for semantic-aware applications
In this paper, we propose a versatile disambiguation approach which can be used to make explicit the meaning of structure based information such as XML schemas, XML document struc...
Federica Mandreoli, Riccardo Martoglia, Enrico Ron...
WWW
2006
ACM
14 years 9 months ago
What's really new on the web?: identifying new pages from a series of unstable web snapshots
Identifying and tracking new information on the Web is important in sociology, marketing, and survey research, since new trends might be apparent in the new information. Such chan...
Masashi Toyoda, Masaru Kitsuregawa
WWW
2005
ACM
14 years 9 months ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger
WISE
2000
Springer
14 years 1 months ago
Structured Web Pages Management for Efficient Data Retrieval
David Taniar, Yi Jiang, J. Wenny Rahayu, L. Bishay