Abstract— As the datasets used to fuel modern scientific discovery grow increasingly large, they become increasingly difficult to manage using conventional software. Parallel d...
Sarah Loebman, Dylan Nunley, YongChul Kwon, Bill H...
A major component of many cloud services is query processing on data stored in the underlying cloud cluster. The traditional techniques for query processing on a cluster are those...
Adrian Daniel Popescu, Debabrata Dash, Verena Kant...
Cluster Storage Systems where storage devices are distributed across a large number of nodes are able to reduce the I/O bottleneck problems present in most centralized storage sys...
Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to a...
In this paper, we discuss some of the lessons that we have learned working with the Hadoop and Sector/Sphere systems. Both of these systems are cloud-based systems designed to sup...