We propose two new instructions, swperm and sieve, that can be used to efficiently complete an arbitrary bit-level permutation of an n-bit word with or without repetitions. Permut...
For simulation, a tradeoff exists between speed and accuracy. The more instructions simulated from the workload, the more accurate the results — but at a higher cost. To reduce ...
For maximum performance, an out-of-order processor must issue load instructions as early as possible, while avoiding memory-order violations with prior store instructions that wri...
- We present techniques for exploiting parallelism extracted from loops on an MIMD system. Parallelism is exploited through parallel execution of instructions on multiple processor...
Fast instruction decoding is a challenge for the design of CISC microprocessors. A well-known solution to overcome this problem is using a trace cache. It stores and fetches alrea...