Batched stream processing is a new distributed data processing paradigm that models recurring batch computations on incrementally bulk-appended data streams. The model is inspired...
Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, B...
Evaluating the resiliency of stateful Internet services to significant workload spikes and data hotspots requires realistic workload traces that are usually very difficult to obt...
This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evolving data sets. These data-intensive applications perform complex multi-step c...
Dionysios Logothetis, Christopher Olston, Benjamin...
The increasing popularity of cloud storage is leading organizations to consider moving data out of their own data centers and into the cloud. However, success for cloud storage pr...
Hussam Abu-Libdeh, Lonnie Princehouse, Hakim Weath...
Large-scale, user-facing applications are increasingly moving from relational databases to distributed key/value stores for high-request-rate, low-latency workloads. Often, this m...
Michael Armbrust, Nick Lanham, Stephen Tu, Armando...
Over the last 10–15 years, our industry has developed and deployed many large-scale Internet services, from e-commerce to social networking sites, all facing common challenges i...
Emre Kiciman, V. Benjamin Livshits, Madanlal Musuv...
Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We ca...
Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil G...
To address the limitations of centralized shared storage for cloud computing, we are building Lithium, a distributed storage system designed specifically for virtualization workl...