In the sub-micron technology era, wire delays are becoming much more important than gate delays, making it particularly attractive to go for clustered designs. A common form of clustering adopted in processors is to replace the centralized instruction scheduler with multiple smaller schedulers that work in parallel within a single chip. Studies have found that existing interconnects connecting onchip clusters, as well as proposed instruction distribution algorithms, are not scalable. The objective of this paper is to investigate alternate interconnects (we investigate hierarchical interconnects) that provide scalable performance with increase in on-chip clusters. We also investigate distribution algorithms that are best suited for these interconnects. Experimental results of these new interconnects with appropriate distribution techniques show that they more scalable than the existing techniques. achieve an IPC that is around 15-20% more than the most scalable existing configuration,...