Current superscalar processors, both RISC and CISC, require substantial instruction fetch and decode bandwidth to keep multiple functional units utilized. While CISC instructions ...
A series of recent algorithmic advances has delivered highly effective methods for pairing evaluation and parameter generation. However, the resulting multitude of options means m...
Efficient architecture exploration and design of application specific instruction-set processors (ASIPs) requires retargetable software development tools, in particular C compil...
Jianjiang Ceng, Manuel Hohenauer, Rainer Leupers, ...
MicroSIMD architectures incorporating subword parallelism are very efficient for application-specific media processors as well as for fast multimedia information processing in gen...
Delayed branching is a technique to alleviate branch hazards without expensive hardware branch prediction mechanisms. For VLIW processors with deep pipelines and many issue slots,...