Towards High Speed Grammar Induction on Large Text Corpora

15 years 5 months ago

Download staff.science.uva.nl

Abstract. In this paper we describe an e cient and scalable implementation for grammar induction based on the EMILE approach ( 2], 3], 4], 5], 6]). The current EMILE 4.1 implementation ( 11]) is one of the rst e cient grammar induction algorithms that work on free text. Although EMILE 4.1 is far from perfect, it enables researchers to do empirical grammar induction research on various types of corpora. The EMILE approach is based on notions from categorial grammar (cf. 10]), which is known to generate the class of context-free languages. EMILE learns from positive examples only (cf. 1], 7], 9]). We describe the algorithms underlying the approach and some interesting practical results on small and large text collections. As shown in the articles mentioned above, in the limit EMILE learns the correct grammatical structure of a language from sentences of that language. The conducted experiments show that, put into practice, EMILE 4.1 is e cient and scalable. This current implementation le...

Pieter W. Adriaans, Marten Trautwein, Marco Vervoo

Real-time Traffic

Emile | EMILE Approach | Grammar Induction | SOFSEM 2000 | Theoretical Computer Science |

claim paper

Post Info
More Details (n/a)

Added	25 Aug 2010
Updated	25 Aug 2010
Type	Conference
Year	2000
Where	SOFSEM
Authors	Pieter W. Adriaans, Marten Trautwein, Marco Vervoort

Comments (0)

Sciweavers

Towards High Speed Grammar Induction on Large Text Corpora

Emile | EMILE Approach | Grammar Induction | SOFSEM 2000 | Theoretical Computer Science |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers