Similarity-Based Estimation of Word Cooccurrence Probabilities

14 years 1 months ago

Download www.aclweb.org

In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. Statistical NLP methods determine the likelihood of a word combination according to its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in a given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on "most similar" words. We describe a probabilistic word association model based on distributional word similarity, and apply it to improving probability estimates for unseen word bigrams in a variant of Katz's back-off model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigram...

Ido Dagan, Fernando C. N. Pereira, Lillian Lee

Real-time Traffic

ACL 1994 | ACL 2007 | Unseen Word | Unseen Word Combinations | Word Combinations |

claim paper

Post Info
More Details (n/a)

Added	02 Nov 2010
Updated	02 Nov 2010
Type	Conference
Year	1994
Where	ACL
Authors	Ido Dagan, Fernando C. N. Pereira, Lillian Lee

Comments (0)

Sciweavers

Similarity-Based Estimation of Word Cooccurrence Probabilities

ACL 1994 | ACL 2007 | Unseen Word | Unseen Word Combinations | Word Combinations |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers