Sciweavers

971 search results - page 139 / 195
» Common Sense from the Web
Sort
View
WWW
2007
ACM
14 years 10 months ago
U-REST: an unsupervised record extraction system
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
Yuan Kui Shen, David R. Karger
WWW
2004
ACM
14 years 10 months ago
Enforcing strict model-view separation in template engines
The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of ...
Terence John Parr
DOCENG
2009
ACM
14 years 4 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
ICDM
2006
IEEE
164views Data Mining» more  ICDM 2006»
14 years 4 months ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Philip Zigoris, Damian Eads, Yi Zhang
ECAI
2004
Springer
14 years 3 months ago
Stacked Generalization for Information Extraction
1 This paper defines a new stacked generalization framework in the context of information extraction (IE) from online sources. The proposed setting removes the constraint of apply...
Georgios Sigletos, Georgios Paliouras, Constantine...