In production Grids for scientific applications, service and resource failures must be detected and addressed quickly. In this paper, we describe the monitoring infrastructure used by the Earth System Grid (ESG) project, a scientific collaboration that supports global climate research. ESG uses the Globus Toolkit Monitoring and Discovery System (MDS4) to monitor its resources. We describe how the MDS4 Index Service collects information about ESG resources and how the MDS4 Trigger Service checks specified failure conditions and notifies system administrators when failures occur. We present monitoring statistics for May 2006 and describe our experiences using MDS4 to monitor ESG resources over the last two years.
Ann L. Chervenak, Jennifer M. Schopf, Laura Pearlm