Due to increasing power densities, both on-chip average and peak temperatures are fast becoming a serious bottleneck in processor design. This is due to the cost of removing the heat generated, and the performance impact of dealing with thermal emergencies. So far microarchitectural techniques to control temperature have mainly focused on the processor backend (in particular the execution units), whereas the frontend has not received much attention. However, as the temperature of the backend remains controlled and the processor throughput increases, the heat dissipated by the frontend becomes more significant, and one of the major contributors to the total average temperature. This paper proposes and evaluates a distributed frontend for clustered microarchitectures that is able to reduce power density and temperature. First, a distributed mechanism for renaming and committing instructions is proposed. Second, a sub-banked trace cache with a bank hopping mechanism is presented. Finally...