Debugging real systems is hard, requires deep knowledge of the code, and is time-consuming. Bug reports rarely provide sufficient information, thus forcing developers to turn int...
Modern Byzantine fault-tolerant state machine replication (BFT) protocols involve about 20.000 lines of challenging C++ code encompassing synchronization, networking and cryptogra...
Rachid Guerraoui, Nikola Knezevic, Vivien Quéma, ...
Deployment of SSDs in enterprise settings is limited by the low erase cycles available on commodity devices. Redundancy solutions such as RAID can potentially be used to protect a...
We propose a mechanism that allows applications to survive operating system kernel crashes and continue functioning with no application data loss after a system reboot. This mecha...
This paper presents a technique that helps automate the reverse engineering of device drivers. It takes a closed-source binary driver, automatically reverse engineers the driver...
As organizations start to use data-intensive cluster computing systems like Hadoop and Dryad for more applications, there is a growing need to share clusters between users. Howeve...
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma...
Data centers avoid IP Multicast because of a series of problems with the technology. We propose Dr. Multicast (MCMD), a system that maps IPMC operations to a combination of point-...
Cloud computing offers users the ability to access large pools of computational and storage resources on demand. Multiple commercial clouds already allow businesses to replace, or...
In this paper, we propose a management framework for protecting large computer systems against operator mistakes. By detecting and confining mistakes to isolated portions of the ...
Fabio Oliveira, Andrew Tjang, Ricardo Bianchini, R...