Sciweavers

NIPS
2003

Unsupervised Context Sensitive Language Acquisition from a Large Corpus

14 years 28 days ago
Unsupervised Context Sensitive Language Acquisition from a Large Corpus
We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed ...
Zach Solan, David Horn, Eytan Ruppin, Shimon Edelm
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where NIPS
Authors Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman
Comments (0)