Sciweavers

WWW
2011
ACM

Unsupervised query segmentation using only query logs

13 years 8 months ago
Unsupervised query segmentation using only query logs
We introduce an unsupervised query segmentation scheme that uses query logs as the only resource and can effectively capture the structural units in queries. We believe that Web search queries have a unique syntactic structure which is distinct from that of English or a bag-of-words model. The segments discovered by our scheme help understand this underlying grammatical structure. We apply a statistical model based on Hoeffding’s Inequality to mine significant word n-grams from queries and subsequently use them for segmenting the queries. Evaluation against manually segmented queries shows that this technique can detect rare units that are missed by our Pointwise Mutual Information (PMI) baseline. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval Models General Terms Algorithms, Measurement, Experimentation Keywords Query Grammar, Query Structure, Unsupervised Query Segmentation, Hoeffding’s Inequality
Nikita Mishra, Rishiraj Saha Roy, Niloy Ganguly, S
Added 15 May 2011
Updated 15 May 2011
Type Journal
Year 2011
Where WWW
Authors Nikita Mishra, Rishiraj Saha Roy, Niloy Ganguly, Srivatsan Laxman, Monojit Choudhury
Comments (0)