Recently several important relational database tasks such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these tasks are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such tasks is the size of the workload. In this paper, we present the novel problem of workload compression which helps improve the scalability of such tasks. We present a principled solution to this challenging problem. Our solution is broadly applicable to a variety of workload-driven tasks, while allowing for incorporation of task specific knowledge. We have implemented this solution and our experiments illustrate its effectiveness in the context of two workload-driven tasks: index selection and approximate query processing.
Surajit Chaudhuri, Ashish Kumar Gupta, Vivek R. Na