With increasing demands for high performance by embedded systems, especially by digital signal processing applications, embedded processors must increase available instruction level parallelism (ILP) within significant constraints on power consumption and chip cost. Clustered VLIW (Very Long Instruction Word) architectures can be used to address this problem. Their clusters have to communicate by some mechanism to ensure that the data in one cluster can be used by other clusters as and when required. We selected four different mechanisms of inter cluster communication under the category of fully connected deadlock free networks. Then we designed and implemented the VLIW processors employing these mechanisms and analyzed them against various performance parameters ? cycle time, cycle count overhead, instruction size and gate count. Also the effects of changing the data width and number of registers in a register file are analyzed.