Latency-insensitive systems were recently proposed by Carloni et al. as a correct-by-construction methodology for single-clock system-on-a-chip (SoC) design using predesigned IP blocks. Their approach overcomes the problem of long latencies of global interconnects in deep-submicron technologies, while still maintaining much of the inherent simplicity of synchronous design. In particular, wires whose latency is greater than a clock cycle are segmented using "relay stations," and IP blocks are made robust to arbitrary communication delays. This paper shows, however, that significant extensions are needed to make latency-insensitive systems useful for the practical design of large-scale SoC's. In particular, this paper proposes three extensions. The first extension allows each synchronous module to treat its input and output channels in a much more flexible manner, i.e., with greater decoupling. The second extension generalizes inter-module communication from point-to-poin...