Sciweavers

502 search results - page 20 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
ICDAR
2009
IEEE
14 years 3 months ago
Scalable Feature Extraction from Noisy Documents
We cope with the metadata recognition in layoutoriented documents. We address the problem as a classification task and propose a method for automatic extraction of relevant featu...
Loïc Lecerf, Boris Chidlovskii
NAACL
2010
13 years 6 months ago
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...
Jason R. Smith, Chris Quirk, Kristina Toutanova
ICDM
2006
IEEE
164views Data Mining» more  ICDM 2006»
14 years 2 months ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Philip Zigoris, Damian Eads, Yi Zhang
ICDAR
2009
IEEE
14 years 3 months ago
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques
There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...
Tamir Hassan
DL
2000
Springer
162views Digital Library» more  DL 2000»
14 years 24 days ago
Snowball: extracting relations from large plain-text collections
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use...
Eugene Agichtein, Luis Gravano