In this paper we present a clustered, multiple-clock domain (CMCD) microarchitecture that combines the benefits of both clustering and Globally Asynchronous Locally Synchronous (GALS) designs. We also present a mechanism for dynamically adapting the frequency and voltage of the frontend of the CMCD with the goal to optimize the energy-delay2 product (ED2P). Our mechanism has minimal hardware cost, is entirely selfadjustable, does not depend on any thresholds, and achieves results close to optimal. We evaluate it on 16 SPEC 2000 applications and report 17.5% ED2P reduction on average (80% of the upper bound).