Tuning parallel applications requires the use of effective tools for detecting performance bottlenecks. Along a parallel program execution, many individual situations of performance degradation may arise. We believe that an exhaustive and time–aware tracing at a fine–grain level is essential to capture this kind of situations. This paper presents a tracing mechanism based on dynamic code interposition, and compares it with the usual compiler–directed code injection. Dynamic code interposition adds monitoring code at run–time to unmodified binaries and shared libraries, making it suitable for environments in which the compiler or the available tools do not offer instrumentation facilities. Static injection and dynamic interposition techniques are used to collect detailed traces that feed an analysis tool. Both environments meet the accuracy and performance goals required to profile and analyze parallel applications and runtime libraries.