Sciweavers

503 search results - page 35 / 101
» Live Debugging of Distributed Systems
Sort
View
SIGSOFT
2010
ACM
13 years 7 months ago
Finding latent performance bugs in systems implementations
Robust distributed systems commonly employ high-level recovery mechanisms enabling the system to recover from a wide variety of problematic environmental conditions such as node f...
Charles Edwin Killian, Karthik Nagaraj, Salman Per...
MOBISYS
2007
ACM
14 years 9 months ago
NodeMD: diagnosing node-level faults in remote wireless sensor systems
Software failures in wireless sensor systems are notoriously difficult to debug. Resource constraints in wireless deployments substantially restrict visibility into the root cause...
Veljko Krunic, Eric Trumpler, Richard Han
ICDCS
2003
IEEE
14 years 3 months ago
Software Fault Tolerance of Distributed Programs Using Computation Slicing
Writing correct distributed programs is hard. In spite of extensive testing and debugging, software faults persist even in commercial grade software. Many distributed systems, esp...
Neeraj Mittal, Vijay K. Garg
SIGMOD
2009
ACM
140views Database» more  SIGMOD 2009»
14 years 10 months ago
Distributed data-parallel computing using a high-level programming language
The Dryad and DryadLINQ systems offer a new programming model for large scale data-parallel computing. They generalize previous execution environments such as SQL and MapReduce in...
Michael Isard, Yuan Yu
EUROSYS
2008
ACM
14 years 7 months ago
BorderPatrol: isolating events for black-box tracing
Causal request traces are valuable to developers of large concurrent and distributed applications, yet difficult to obtain. Traces show how a request is processed, and can be anal...
Eric Koskinen, John Jannotti