Despite great efforts on the design of ultra-reliable components, the increase of system size and complexity has outpaced the improvement of component reliability. As a result, fa...
Jiexing Gu, Ziming Zheng, Zhiling Lan, John White,...
Abstract. Many large-scale optimization problems rely on graph theoretic solutions; yet high-performance computing has traditionally focused on regular applications with high degre...
Many scientific applications manipulate large amount of data and, therefore, are parallelized on high-performance computing systems to take advantage of their computational power a...
Border Gateway Protocol (BGP) is the standard routing protocol used in the Internet for routing packets between the Autonomous Systems (ASes). It is known that BGP can take hundre...
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...