Several multithreading techniques have been proposed to reduce the resource underutilization in Very Long Instruction Word (VLIW) processors. Simultaneous MultiThreading (SMT) is a popular technique which improves processor performance by issuing multiple instructions from different threads. SMT requires extra hardware to merge instructions from different threads. The complexity of this hardware increases substantially with the number of threads, limiting the number of threads that can be realistically supported to only 2. Cluster-level Simultaneous MultiThreading (CSMT) is a technique that merges instructions from threads at the cluster level. CSMT has a much lower merging hardware cost and can support a larger number of threads. However, CSMT performance is lower than SMT. In this paper, we evaluate several hardware designs that can support a high number of threads by using a merging scheme that combines both SMT and CSMT merging. For instance, one of the evaluated schemes, which me...