This paper investigates the problem of Partitioning Skew1 in MapReduce-based system. Our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence ...
Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng ...
— MapReduce has become an effective approach to big data analytics in large cluster systems, where SQL-like queries play important roles to interface between users and systems. H...
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to ge...
To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this paper, we propose the design of a new clu...
Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chi...
In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our s...
Christopher Yang, Christine Yen, Ceryen Tan, Samue...