Sciweavers

SIGMETRICS
2006
ACM

Stardust: tracking activity in a distributed storage system

14 years 5 months ago
Stardust: tracking activity in a distributed storage system
Performance monitoring in most distributed systems provides minimal guidance for tuning, problem diagnosis, and decision making. Stardust is a monitoring infrastructure that replaces traditional performance counters with end-to-end traces of requests and allows for efficient querying of performance metrics. Such traces better inform key administrative performance challenges by enabling, for example, extraction of per-workload, per-resource demand information and per-workload latency graphs. This paper reports on our experience building and using end-to-end tracing as an on-line monitoring tool in a distributed storage system. Using diverse system workloads and scenarios, we show that such fine-grained tracing can be made efficient (less than 6% overhead) and is useful for on- and off-line analysis of system behavior. These experiences make a case for having other systems incorporate such an instrumentation framework. Categories and Subject Descriptors C.4 [Performance of Systems]: ...
Eno Thereska, Brandon Salmon, John D. Strunk, Matt
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where SIGMETRICS
Authors Eno Thereska, Brandon Salmon, John D. Strunk, Matthew Wachs, Michael Abd-El-Malek, Julio Lopez, Gregory R. Ganger
Comments (0)