— A critical problem facing by managing large-scale clusters is to identify the location of problems in a system in case of unusual events. As the scale of high performance compu...
DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wid...
We focus on automatically diagnosing different performance problems in parallel file systems by identifying, gathering and analyzing OS-level, black-box performance metrics on eve...
Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi, Priya...
A new genetic algorithm to detect communities in social networks is presented. The algorithm uses a fitness function able to identify groups of nodes in the network having dense ...
This paper proposes a novel way to use virtual memorymapped communication (VMMC) to reduce the failover time on clusters. With the VMMC model, applications’ virtual address spac...