Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and efficient keyword query processing over databases a very challenging task. In this paper, we introduce the problem of query cleaning for keyword search queries in a database context and propose a set of effective and efficient solutions. Query cleaning involves semantic linkage and spelling corrections of database relevant query words, followed by segmentation of nearby query words such that each segment corresponds to a high quality data term. We define a quality metric of a keyword query, and propose a number of algorithms for cleaning keyword queries optimally. It is demonstrated that the basic optimal query cleaning problem can be solved using a dynamic programming algorithm. We further extend the basic algorithm to address incremental query cleaning and top-k optimal query cleaning. The incremental query ...
Ken Q. Pu, Xiaohui Yu