Sciweavers

WWW
2008
ACM

Unsupervised query segmentation using generative language models and wikipedia

15 years 3 days ago
Unsupervised query segmentation using generative language models and wikipedia
In this paper, we propose a novel unsupervised approach to query segmentation, an important task in Web search. We use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, we incorporate evidence from Wikipedia. Experiments show that our approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another ...
Bin Tan, Fuchun Peng
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2008
Where WWW
Authors Bin Tan, Fuchun Peng
Comments (0)