In many areas such as e-commerce, mission-critical N-tier applications have grown increasingly complex. They are characterized by non-stationary workloads (e.g., peak load several times the sustained load) and complex dependencies among the component servers. We have studied N-tier applications through a large number of experiments using the RUBiS and RUBBoS benchmarks. We apply statistical methods such as kernel density estimation, adaptive filtering, and change detection through multiple-model hypothesis tests to analyze more than 200GB of recorded data. Beyond the usual single-bottlenecks, we have observed more intricate bottleneck phenomena. For instance, in several configurations all system components show average resource utilization significantly below saturation, but overall throughput is limited despite addition of more resources. More concretely, our analysis shows experimental evidence of multi-bottleneck cases with low average resource utilization where several resource...