Typical Grid computing scenarios involve many distributed hardware and software components. The more components that are involved, the more likely it is that one of them may fail. In order for Grid computing to succeed, there must be a simple mechanism to determine which component failed and why. Instrumentation of all Grid applications and middleware is an important part of the solution to this problem. However, it must be possible to control and adapt the amount of instrumentation data produced in order to not be flooded by this data. In this paper we describe a scalable, high-performance instrumentation activation mechanism that addresses this problem.
Dan Gunter, Brian Tierney, Craig E. Tull, Vibha Vi