Sciweavers

SOSP
2009
ACM

Distributed aggregation for data-parallel computing: interfaces and implementations

14 years 9 months ago
Distributed aggregation for data-parallel computing: interfaces and implementations
Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient available mechanism for computations such as matrix multiplication and graph traversal. Such algorithms typically require nonstandard aggregations that are more sophisticated than traditional built-in database functions such as Sum and Max. As a result, the ease of programming user-defined aggregations, and the efficiency of their implementation, is of great current interest. This paper evaluates the interfaces and implementations for user-defined aggregation in several state of the art distributed computing systems: Hadoop, databases such as Oracle Parallel Server, and DryadLINQ. We show that: the degree of language integration between userdefined functions and the high-level query language has an impact on code legibility and simplicity; the choice of programming interface...
Yuan Yu, Pradeep Kumar Gunda, Michael Isard
Added 17 Mar 2010
Updated 17 Mar 2010
Type Conference
Year 2009
Where SOSP
Authors Yuan Yu, Pradeep Kumar Gunda, Michael Isard
Comments (0)