Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and i...
Bugs in kernel extensions remain one of the main causes of poor operating system reliability despite proposed techniques that isolate extensions in separate protection domains to ...
Miguel Castro, Manuel Costa, Jean-Philippe Martin,...
Commodity computer systems contain more and more processor cores and exhibit increasingly diverse architectural tradeoffs, including memory hierarchies, interconnects, instructio...
Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While ma...
Asim Kadav, Matthew J. Renzelmann, Michael M. Swif...
Modern computer systems have been built around the assumption that persistent storage is accessed via a slow, block-based interface. However, new byte-addressable, persistent memo...
Jeremy Condit, Edmund B. Nightingale, Christopher ...
We describe Neutron, a version of the TinyOS operating system that efficiently recovers from memory safety bugs. Where existing schemes reboot an entire node on an error, Neutron...
Yang Chen, Omprakash Gnawali, Maria A. Kazandjieva...
Faulty device drivers cause significant damage through down time and data loss. The problem can be mitigated by an improved driver development process that guarantees correctness...
Leonid Ryzhyk, Peter Chubb, Ihor Kuz, Etienne Le S...
We revisit the problem of scaling software routers, motivated by recent advances in server technology that enable highspeed parallel processing—a feature router workloads appear...
Mihai Dobrescu, Norbert Egi, Katerina J. Argyraki,...
Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zer...