Search Sciweavers | Sciweavers

502 search results - page 20 / 101

» Extracting Partial Structures from HTML Documents

176

Voted

ICDAR
2009
IEEE

168views Document Analysis» more ICDAR 2009»

Scalable Feature Extraction from Noisy Documents

16 years 1 months ago

Download www.cvc.uab.es

We cope with the metadata recognition in layoutoriented documents. We address the problem as a classiﬁcation task and propose a method for automatic extraction of relevant featu...

Loïc Lecerf, Boris Chidlovskii

claim paper

Read More »

193

Voted

NAACL
2010

182views Computational Linguistics» more NAACL 2010»

Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment

15 years 4 months ago

Download research.microsoft.com

The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...

Jason R. Smith, Chris Quirk, Kristina Toutanova

claim paper

Read More »

192

Voted

ICDM
2006
IEEE

164views Data Mining» more ICDM 2006»

Unsupervised Learning of Tree Alignment Models for Information Extraction

16 years 23 days ago

Download users.soe.ucsc.edu

We propose an algorithm for extracting ﬁelds from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...

Philip Zigoris, Damian Eads, Yi Zhang

claim paper

Read More »

171

click to vote

ICDAR
2009
IEEE

148views Document Analysis» more ICDAR 2009»

User-Guided Wrapping of PDF Documents Using Graph Matching Techniques

16 years 1 months ago

Download www.cvc.uab.es

There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...

Tamir Hassan

claim paper

Read More »

195

Voted

DL
2000
Springer

162views Digital Library» more DL 2000»

Snowball: extracting relations from large plain-text collections

15 years 11 months ago

Download www.cs.columbia.edu

Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use...

Eugene Agichtein, Luis Gravano

claim paper

Read More »

« Prev « First page 20 / 101 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers