Clustered processors lose performance as a result of clusteringinduced stalls. Such stalls are the result of distributed resources and cluster communication delays. Our performance analysis of clustered architectures shows how previously proposed methods reduce one group of stalls at the expense of the other. Moreover, we extend previous work and present a new class of cluster assignment heuristics for high-performance clustered processors. We affirm that it is possible to improve performance in clustered processors by taking a more balanced approach towards clusteringinduced stalls. Our techniques rely on estimating and predicting resource utilization for clustered processors. We show that, on average, our best technique reduces the performance gap between a dual-clustered and a centralized processor down to 6.9% and 9.2% for 8-way and 6-way processors and for a representative subset of SPEC2K benchmarks. Categories and Subject Descriptors