Sciweavers

ACL
1993

Distributional Clustering of English Words

14 years 1 months ago
Distributional Clustering of English Words
We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. In many cases, the clusters can be thought of as encoding coarse sense distinctions. Deterministic annealing is used to find lowest distortion sets of clusters: as the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.
Fernando C. N. Pereira, Naftali Tishby, Lillian Le
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1993
Where ACL
Authors Fernando C. N. Pereira, Naftali Tishby, Lillian Lee
Comments (0)