Exploiting Common Subexpressions for Cloud Query Processing

12 years 5 months ago

Download www.public.asu.edu

—Many companies now routinely run massive data analysis jobs – expressed in some scripting language – on large clusters of low-end servers. Many analysis scripts are complex and contain common subexpressions, that is, intermediate results that are subsequently joined and aggregated in multiple different ways. Applying conventional optimization techniques to such scripts will produce plans that execute a common subexpression multiple times, once for each consumer, which is clearly wasteful. Moreover, different consumers may have different physical requirements on the result: one consumer may want it partitioned on a column A and another one partitioned on column B. To ﬁnd a truly optimal plan, the optimizer must trade off such conﬂicting requirements in a cost-based manner. In this paper we show how to extend a Cascade-style optimizer to correctly optimize scripts containing common subexpression. The approach has been prototyped in SCOPE, Microsoft’s system for massive data ...

Yasin N. Silva, Paul-Ake Larson, Jingren Zhou

Real-time Traffic

Cascade Style | Database | ICDE 2012 | Massive Data Analysis | World Scripts |

claim paper

Post Info
More Details (n/a)

Added	28 Sep 2012
Updated	28 Sep 2012
Type	Journal
Year	2012
Where	ICDE
Authors	Yasin N. Silva, Paul-Ake Larson, Jingren Zhou

Comments (0)

Sciweavers

Exploiting Common Subexpressions for Cloud Query Processing

Cascade Style | Database | ICDE 2012 | Massive Data Analysis | World Scripts |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers