We present an automatic skew mitigation approach for userdefined MapReduce programs and present SkewTune, a system that implements this approach as a drop-in replacement for an e...
YongChul Kwon, Magdalena Balazinska, Bill Howe, Je...
Cloud enabled systems have become a crucial component to efficiently process and analyze massive amounts of data. One of the key data processing and analysis operations is the Sim...
There exists a need for high performance, read-only mainmemory database systems for OLAP-style application scenarios. Most of the existing works in this area are centered around t...
We address the problem of making online, parallel query plans fault-tolerant: i.e., provide intra-query fault-tolerance without blocking. We develop an approach that not only achi...
Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient...
Conventional OLTP systems assign each transaction to a worker thread and that thread accesses data, depending on what the transaction dictates. This thread-to-transaction work ass...
Over the last decade the cost of producing genomic sequences has dropped dramatically due to the current so called “next-gen” sequencing methods. However, these next-gen seque...
Ranking queries report the top-K results according to a user-defined scoring function. A widely used scoring function is the weighted summation of multiple scores. Often times, u...
Mohamed A. Soliman, Ihab F. Ilyas, Davide Martinen...
Table partitioning splits a table into smaller parts that can be accessed, stored, and maintained independent of one another. From their traditional use in improving query perform...
To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this paper, we propose the design of a new clu...
Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chi...