Ever increasing complexity and heterogeneity of SoC platforms require diversified on-chip communication schemes beyond the currently omnipresent shared bus architectures. To prevent time consuming design changes late in the design flow, we propose the early exploration of the on-chip communication architecture to meet performance and cost requirements. Based on SystemC 2.0.1 we have defined a modular exploration framework, which is able to capture the effect on performance for different on-chip networks like dedicated point-to-point, shared bus, and crossbar topologies. Monitoring of performance parameters like utilization, latency and throughput drives the mapping of the inter-module traffic to an efficient communication architecture. The effectiveness of our approach is demonstrated by the exemplary design of a high performance Network Processing Unit (NPU), which is compared against a commercial NPU device. Categories and Subject Descriptors B.8.2 [Performance And Reliability]...