For multi-gigahertz designs in nanometer technologies, data transfers on global interconnects take multiple clock cycles. In this paper, we propose a regular distributed register (RDR) micro-architecture for multi-cycle on-chip communication. An RDR architecture structurally consists of a two-dimensional array of islands, each of which contains a cluster of computational logic and local register files. We also propose a new synthesis methodology based on the RDR architecture. Novel layout-driven architectural synthesis algorithms have been developed for RDR. Application of these algorithms to several real-life benchmarks demonstrates 44% improvement on average in terms of the clock period and 37% improvement on average in terms of the final latency. Categories and Subject Descriptors B.7.2 [Hardware]: INTEGRATED CIRCUITS – Design Aids General Terms Algorithms, Performance, Design, Experimentation Keywords RDR, multi-cycle communication, deep sub-micron, timing closure, scheduling, b...