A common approach for dealing with large data sets is to stream over the input in one pass, and perform computations using sublinear resources. For truly massive data sets, howeve...
Jon Feldman, S. Muthukrishnan, Anastasios Sidiropo...
Modularity is a central theme in any scalable program analysis. The core idea in a modular analysis is to build summaries at procedure boundaries, and use the summary of a procedu...
Aws Albarghouthi, Rahul Kumar, Aditya V. Nori, Sri...
Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and i...
Graphs or networks can be used to model complex systems. Detecting community structures from large network data is a classic and challenging task. In this paper, we propose a nove...
DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments su...
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Bud...