This work proposes a new architecture and execution model called 2D-VLIW. This architecture adopts an execution model based on large pieces of computation running over a matrix of functional units connected by a set of local register spread across the matrix. Experiments using the Mediabench and SPECint00 programs and the Trimaran compiler show performance gains ranging from 5% to 63%, when comparing our proposal to an EPIC architecture with the same number of registers and functional units. We also show that the g721−enc program running on a 2D-VLIW 3×3 matrix