Delayed branching is a technique to alleviate branch hazards without expensive hardware branch prediction mechanisms. For VLIW processors with deep pipelines and many issue slots,...
To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. ...
William Jalby, Christophe Lemuet, Sid Ahmed Ali To...
One of the upcoming challenges in embedded processing is to incorporate an increasing amount of adaptivity in order to respond to the multifarious constraints induced by today’s...
Lars Bauer, Muhammad Shafique, Stephanie Kreutz, J...
In this paper, we propose a multithreaded processor architecture which improves machine throughput. In our processor architecture, instructions from different threads (not a singl...
Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-mem...
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay ...