Learning Term-weighting Functions for Similarity Measures

13 years 10 months ago

Download research.microsoft.com

Measuring the similarity between two texts is a fundamental problem in many NLP and IR applications. Among the existing approaches, the cosine measure of the term vectors representing the original texts has been widely used, where the score of each term is often determined by a TFIDF formula. Despite its simplicity, the quality of such cosine similarity measure is usually domain dependent and decided by the choice of the termweighting function. In this paper, we propose a novel framework that learns the term-weighting function. Given the labeled pairs of texts as training data, the learning procedure tunes the model parameters by minimizing the specified loss function of the similarity score. Compared to traditional TFIDF term-weighting schemes, our approach shows a significant improvement on tasks such as judging the quality of query suggestions and filtering irrelevant ads for online advertising.

Wen-tau Yih

Real-time Traffic

Cosine Similarity Measure | EMNLP 2009 | Natural Language Processing | Texts | TFIDF Term-weighting Schemes |

claim paper

Post Info
More Details (n/a)

Added	17 Feb 2011
Updated	17 Feb 2011
Type	Journal
Year	2009
Where	EMNLP
Authors	Wen-tau Yih

Comments (0)

Sciweavers

Learning Term-weighting Functions for Similarity Measures

Cosine Similarity Measure | EMNLP 2009 | Natural Language Processing | Texts | TFIDF Term-weighting Schemes |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers