Distributional Clustering of English Words

14 years 2 months ago

Download acl.ldc.upenn.edu

We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. In many cases, the clusters can be thought of as encoding coarse sense distinctions. Deterministic annealing is used to find lowest distortion sets of clusters: as the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

Fernando C. N. Pereira, Naftali Tishby, Lillian Le

Real-time Traffic

ACL 1993 | ACL 2007 | Average Context Distributions | Particular Syntactic Contexts | Relative Frequency Distributions |

claim paper

Post Info
More Details (n/a)

Added	02 Nov 2010
Updated	02 Nov 2010
Type	Conference
Year	1993
Where	ACL
Authors	Fernando C. N. Pereira, Naftali Tishby, Lillian Lee

Comments (0)

Sciweavers

Distributional Clustering of English Words

ACL 1993 | ACL 2007 | Average Context Distributions | Particular Syntactic Contexts | Relative Frequency Distributions |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers