Sciweavers

INFOSCALE
2007
ACM

Mining query logs to optimize index partitioning in parallel web search engines

14 years 1 months ago
Mining query logs to optimize index partitioning in parallel web search engines
Large-scale Parallel Web Search Engines (WSEs) needs to adopt a strategy for partitioning the inverted index among a set of parallel server nodes. In this paper we are interested in devising an effective term-partitioning strategy, according to which the global vocabulary of terms and the associated inverted lists are split into disjoint subsets, and assigned to distinct servers. Due to the workload imbalance caused by the skewed distribution of terms in user queries, finding an effective partitioning strategy is considered a very complex task. In this paper we first formally introduce Term Partitioning as a new optimization problem. Then we show how the knowledge mined from past WSE query logs can be used to fed the objective function of our optimization problem. In particular, the global knowledge comes from the frequent patterns extracted from past usage logs. Finally, we reports many results to show that we are able to effectively reduce both the average number of servers act...
Claudio Lucchese, Salvatore Orlando, Raffaele Pere
Added 26 Oct 2010
Updated 26 Oct 2010
Type Conference
Year 2007
Where INFOSCALE
Authors Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri
Comments (0)