Sciweavers

232 search results - page 19 / 47
» Query-related data extraction of hidden web documents
Sort
View
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
AUSAI
2003
Springer
14 years 26 days ago
Information Extraction via Path Merging
Abstract. In this paper, we describe a new approach to information extraction that neatly integrates top-down hypothesis driven information with bottom-up data driven information. ...
Robert Dale, Cécile Paris, Marc Tilbrook
WWW
2006
ACM
14 years 8 months ago
Using graph matching techniques to wrap data from PDF documents
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...
Tamir Hassan, Robert Baumgartner
IAT
2006
IEEE
14 years 1 months ago
Semantic Labeling of Data by Using the Web
The Web consists of a large amount of unstructured information that hardly can be elaborated by automatic agents. In recent years, a considerable number of techniques for informat...
Leonardo Rigutini, Ernesto Di Iorio, Marco Ernande...
ECIR
2008
Springer
13 years 9 months ago
Clustering Template Based Web Documents
More and more documents on the World Wide Web are based on templates. On a technical level this causes those documents to have a quite similar source code and DOM tree structure. G...
Thomas Gottron