Abstract— In this paper we address the problem of the architectural exploration from the energy/performance point of view of a VLIW processor for embedded systems. We also consider an architectural modification we introduced in order to extend the reference processor so that it can exploit both instruction level parallelism and thread level parallelism. A power model obtained by applying an instruction-level power estimation technique is presented and validated with experimental results. This power model was plugged in a parametric cycle-accurate simulator in order to support architectural exploration. Experimental results derived from the proposed framework show a comparison among different implementations of the reference processor: single and dual cluster implementations, and dual cluster with multithreaded extension.