Sciweavers

971 search results - page 139 / 195
» Common Sense from the Web
Sort
View
WWW
2007
ACM
16 years 4 months ago
U-REST: an unsupervised record extraction system
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
Yuan Kui Shen, David R. Karger
WWW
2004
ACM
16 years 4 months ago
Enforcing strict model-view separation in template engines
The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of ...
Terence John Parr
DOCENG
2009
ACM
15 years 10 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan
ICDM
2006
IEEE
164views Data Mining» more  ICDM 2006»
15 years 10 months ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Philip Zigoris, Damian Eads, Yi Zhang
ECAI
2004
Springer
15 years 9 months ago
Stacked Generalization for Information Extraction
1 This paper defines a new stacked generalization framework in the context of information extraction (IE) from online sources. The proposed setting removes the constraint of apply...
Georgios Sigletos, Georgios Paliouras, Constantine...