Sciweavers

2677 search results - page 113 / 536
» Extracting Structured Data from Web Pages
Sort
View
ACL
2006
13 years 11 months ago
A DOM Tree Alignment Model for Mining Parallel Data from the Web
This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree align...
Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao
BIS
2010
159views Business» more  BIS 2010»
13 years 5 months ago
Comparing Intended and Real Usage in Web Portal: Temporal Logic and Data Mining
Nowadays the software systems, including web portals, are developed from a priori assumptions about how the system will be used. However, frequently these assumptions hold only par...
Jérémy Besson, Ieva Mitasiunaite, Au...
LWA
2008
13 years 11 months ago
Rule-Based Information Extraction for Structured Data Acquisition using TextMarker
Information extraction is concerned with the location of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the ...
Martin Atzmüller, Peter Klügl, Frank Pup...
WSDM
2009
ACM
172views Data Mining» more  WSDM 2009»
14 years 4 months ago
Clustering the tagged web
Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from largescale soc...
Daniel Ramage, Paul Heymann, Christopher D. Mannin...
PVLDB
2008
141views more  PVLDB 2008»
13 years 9 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...