The complexity of parallel I/O systems lies in the deep I/O stack with many software layers and concurrent I/O request handling at multiple layers. This paper explores multi-layer event tracing and analysis to pinpoint the system layers responsible for performance problems. Our approach follows two principles: 1) collect generic (layerindependent) events and I/O characteristics to ease the analysis on cross-layer I/O characteristics evolution; 2) perform bottom-up trace analysis to take advantage of the relatively easy anomaly identification at lower system layers. Our empirical case study discovered root causes for several anomalous performance behaviors of MPI-IO applications running on a parallel file system. First, we detect an anomaly with the asynchronous I/O implementation in the GNU C runtime library. Additionally, we find that concurrent I/O from multiple MPI processes may induce frequent disk seek/rotation and thus degrade the I/O efficiency. We also point out that lack ...