Similarity-Based Models of Word Cooccurrence Probabilities

15 years 6 months ago

Download www.cis.upenn.edu

Abstract. In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach” and “eat a beach” is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on “most similar” words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-oﬀ language model...

Ido Dagan, Lillian Lee, Fernando C. N. Pereira

Real-time Traffic

CORR 1998 | Education | Unseen Bigrams | Unseen Word Combinations | Word Combinations |

claim paper

» Structural disambiguation of morphosyntactic categorial parsing for Korean

» Automatic image annotation and retrieval using crossmedia relevance models

» Toward understanding natural language directions

Post Info
More Details (n/a)

Added	22 Dec 2010
Updated	22 Dec 2010
Type	Journal
Year	1998
Where	CORR
Authors	Ido Dagan, Lillian Lee, Fernando C. N. Pereira

Comments (0)

Sciweavers

Similarity-Based Models of Word Cooccurrence Probabilities

CORR 1998 | Education | Unseen Bigrams | Unseen Word Combinations | Word Combinations |

Explore & Download

Productivity Tools

Sciweavers