Using eigenvectors of the bigram graph to infer morpheme identity

15 years 2 months ago

Download www.aclweb.org

This paper describes the results of some experiments exploring statistical methods to infer syntactic categories from a raw corpus in an unsupervised fashion. It shares certain points in common with Brown et at (1992) and work that has grown out of that: it employs statistical techniques to derive categories based on what words occur adjacent to a given word. However, we use an eigenvector decomposition of a nearest-neighbor graph to produce a two-dimensional rendering of the words of a corpus in which words of the same syntactic category tend to form clusters and neighborhoods. We exploit this technique for extending the value of automatic learning of morphology. In particular, we look at the suffixes derived from a corpus by unsupervised learning of morphology, and we ask which of these suffixes have a consistent syntactic function (e.g., in English, -ed is primarily a mark of verbal past tense, does but

Mikhail Belkin, John A. Goldsmith

Real-time Traffic

CORR 2002 | Education | Raw Corpus | Syntactic Categories | Unsupervised Fashion |

claim paper

Post Info
More Details (n/a)

Added	18 Dec 2010
Updated	18 Dec 2010
Type	Journal
Year	2002
Where	CORR
Authors	Mikhail Belkin, John A. Goldsmith

Comments (0)

Sciweavers

Using eigenvectors of the bigram graph to infer morpheme identity

CORR 2002 | Education | Raw Corpus | Syntactic Categories | Unsupervised Fashion |

Explore & Download

Productivity Tools

Sciweavers