Sciweavers

SOSP
2003
ACM

Performance debugging for distributed systems of black boxes

14 years 9 months ago
Performance debugging for distributed systems of black boxes
Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of “black-box” components: software from many different (perhaps competing) vendors, usually without source code available. Typical solutions-provider employees are not always skilled or experienced enough to debug these systems efficiently. Our goal is to design tools that enable modestly-skilled programmers (and experts, too) to isolate performance bottlenecks in distributed systems composed of black-box nodes. We approach this problem by obtaining message-level traces of system activity, as passively as possible and without any knowledge of node internals or message semantics. We have developed two very different algorithms for inferring the dominant causal paths through a distributed system from these traces. One uses timing information...
Marcos Kawazoe Aguilera, Jeffrey C. Mogul, Janet L
Added 17 Mar 2010
Updated 17 Mar 2010
Type Conference
Year 2003
Where SOSP
Authors Marcos Kawazoe Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, Athicha Muthitacharoen
Comments (0)