Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

193

EMNLP
2007

114views Natural Language Processing» more EMNLP 2007»

Bootstrapping Information Extraction from Field Books

15 years 8 months ago

Bootstrapping Information Extraction from Field Books

Download ilk.uvt.nl

We present two machine learning approaches to information extraction from semi-structured documents that can be used if no annotated training data are available, but there does exist a database ﬁlled with information derived from the type of documents to be processed. One approach employs standard supervised learning for information extraction by artiﬁcially constructing labelled training data from the contents of the database. The second approach combines unsupervised Hidden Markov modelling with language models. Empirical evaluation of both systems suggests that it is possible to bootstrap a ﬁeld segmenter from a database alone. The combination of Hidden Markov and language modelling was found to perform best at this task.

Sander Canisius, Caroline Sporleder

Real-time Traffic

EMNLP 2007 | Hidden Markov | Information Extraction | Natural Language Processing | Training Data |

claim paper

Related Content

» Turning Lectures into Comic Books Using Linguistically Salient Gestures

» Integrating Information to Bootstrap Information Extraction from Web Sites

» Generalized Expectation Criteria for Bootstrapping Extractors using RecordText Alignment

» Setting up a competition framework for the evaluation of structure extraction from OCRed b...

» Extracting Useful Information from the Full Text of Fiction

» A Bootstrapping Method for Extracting Bilingual Text Pairs

» Overview of the INEX 2009 Book Track

» ICDAR 2009 Book Structure Extraction Competition

» Multiview Bootstrapping for Relation Extraction by Exploring Web Features and Linguistic F...

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	EMNLP
Authors	Sander Canisius, Caroline Sporleder

Comments (0)