Abstract. Performance analysis for terascale computing requires a combination of new concepts including distribution, on-line processing and automation. As a foundation for tools realizing these concepts, we present a distributed monitoring approach for clustered SMP architectures that tries to minimize the perturbation of the target application while retaining flexibility with respect to filtering and processing of performance data. We achieve this goal by dividing the monitor in a passive monitoring library linked to the application and an active component called runtime information producer (RIP) that provides performance data (metric- and event based) for individual nodes. Instead of adding an additional layer in the monitoring system that integrates performance data form the individual RIPs we include a directory service as a third component in our approach. Querying this directory service, tools discover which RIPs provide the data they need.