NetPilot: automating datacenter network failure mitigation

12 years 3 months ago

Download conferences.sigcomm.org

The soaring demands for always-on and fast-response online services have driven modern datacenter networks to undergo tremendous growth. These networks often rely on scale-out designs with large numbers of commodity switches to reach immense capacity while keeping capital expenses under check. The downside is more devices means more failures, raising a formidable challenge for network operators to promptly handle these failures with minimal disruptions to the hosted services. Recent research efforts have focused on automatic failure localization. Yet, resolving failures still requires signiﬁcant human interventions, resulting in prolonged failure recovery time. Unlike previous work, NetPilot aims to quickly mitigate rather than resolve failures. NetPilot mitigates failures in much the same way operators do – by deactivating or restarting suspected offending components. NetPilot circumvents the need for knowing the exact root cause of a failure by taking an intelligent trial-and-er...

Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Ma

Real-time Traffic

Communications | Minimal Disruptions | Mitigation Actions | Mitigation Planner | SIGCOMM 2012 |

claim paper

Post Info
More Details (n/a)

Added	27 Sep 2012
Updated	27 Sep 2012
Type	Journal
Year	2012
Where	SIGCOMM
Authors	Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan, Ming Zhang

Comments (0)

Sciweavers

NetPilot: automating datacenter network failure mitigation

Communications | Minimal Disruptions | Mitigation Actions | Mitigation Planner | SIGCOMM 2012 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers