Sciweavers

MIDDLEWARE
2010
Springer

dFault: Fault Localization in Large-Scale Peer-to-Peer Systems

13 years 10 months ago
dFault: Fault Localization in Large-Scale Peer-to-Peer Systems
Distributed hash tables (DHTs) have been adopted as a building block for large-scale distributed systems. The upshot of this success is that their robust operation is even more important as missioncritical applications begin to be layered on them. Even though DHTs can detect and heal around unresponsive hosts and disconnected links, several hidden faults and performance bottlenecks go undetected, resulting in unanswered queries and delayed responses. In this paper, we propose dFault, a system that helps large-scale DHTs to localize such faults. Informed with a log of failed queries called symptoms and some available information about the hosts in the DHT, dFault identifies the potential root causes (hosts and overlay links) that with high likelihood contributed towards those symptoms. Its design is based on the recently proposed dependency graph modeling and inference approach for fault localization. We describe the design of dFault, and show that it can accurately localize the root c...
Pawan Prakash, Ramana Rao Kompella, Venugopalan Ra
Added 29 Jan 2011
Updated 29 Jan 2011
Type Journal
Year 2010
Where MIDDLEWARE
Authors Pawan Prakash, Ramana Rao Kompella, Venugopalan Ramasubramanian, Ranveer Chandra
Comments (0)