Abstract—Reducing resource usage is one of the most important optimization objectives in behavioral synthesis due to its direct impact on power, performance and cost. The datapat...
Despite their superior performance for multimedia applications, vector processors have three limitations that hinder their widespread acceptance. First, the complexity and size of...
Although domain-specialized FPGAs can offer significant area, speed and power improvements over conventional reconfigurable devices, there are several unique and unexplored design...
— Architectural resources and program recurrences are the main limitations to the amount of Instruction-Level Parallelism (ILP) exploitable from loops, the most time-consuming pa...
ILP (Instruction Level Parallelism) processors are being increasingly used in embedded systems. In embedded systems, instructions may be subject to timing constraints. An optimisi...
This work proposes a new architecture and execution model called 2D-VLIW. This architecture adopts an execution model based on large pieces of computation running over a matrix of...
—A shared bus is a suitable structure for minimizing the interconnections costs in system synthesis. It has also been shown that the word-length of Functional Units has a great i...
This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that perfo...