Sciweavers

ICDE
2012
IEEE

Exploiting Common Subexpressions for Cloud Query Processing

12 years 2 months ago
Exploiting Common Subexpressions for Cloud Query Processing
—Many companies now routinely run massive data analysis jobs – expressed in some scripting language – on large clusters of low-end servers. Many analysis scripts are complex and contain common subexpressions, that is, intermediate results that are subsequently joined and aggregated in multiple different ways. Applying conventional optimization techniques to such scripts will produce plans that execute a common subexpression multiple times, once for each consumer, which is clearly wasteful. Moreover, different consumers may have different physical requirements on the result: one consumer may want it partitioned on a column A and another one partitioned on column B. To find a truly optimal plan, the optimizer must trade off such conflicting requirements in a cost-based manner. In this paper we show how to extend a Cascade-style optimizer to correctly optimize scripts containing common subexpression. The approach has been prototyped in SCOPE, Microsoft’s system for massive data ...
Yasin N. Silva, Paul-Ake Larson, Jingren Zhou
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where ICDE
Authors Yasin N. Silva, Paul-Ake Larson, Jingren Zhou
Comments (0)