Sciweavers

NAACL
2007

A Random Text Model for the Generation of Statistical Language Invariants

14 years 1 months ago
A Random Text Model for the Generation of Statistical Language Invariants
A novel random text generation model is introduced. Unlike in previous random text models, that mainly aim at producing a Zipfian distribution of word frequencies, our model also takes the properties of neighboring co-occurrence into account and introduces the notion of sentences in random text. After pointing out the deficiencies of related models, we provide a generation process that takes neither the Zipfian distribution on word frequencies nor the small-world structure of the neighboring co-occurrence graph as a constraint. Nevertheless, these distributions emerge in the process. The distributions obtained with the random generation model are compared to a sample of natural language data, showing high agreement also on word length and sentence length. This work proposes a plausible model for the emergence of large-scale characteristics of language without assuming a grammar or semantics.
Chris Biemann
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2007
Where NAACL
Authors Chris Biemann
Comments (0)