High-performance microprocessors are currently designed to exploit the inherent instruction level parallelism (ILP) available in most applications. The techniques used in their de...
Delayed branching is a technique to alleviate branch hazards without expensive hardware branch prediction mechanisms. For VLIW processors with deep pipelines and many issue slots,...
State-of-the-art communication standards make extensive use of Turbo codes. The complex and power consuming designs that currently implement the turbo decoder expose the need for ...
In this paper we describe a software pipelining framework, CALiBeR (Cluster Aware Load Balancing Retiming Algorithm), suitable for compilers targeting clustered embedded VLIW proc...
The high transistor density afforded by modern VLSI processes have enabled the design of embedded processors that use clustered execution units to deliver high levels of performan...