Serializing instructions (SIs), such as writes to control registers, have many complex dependencies, and are difficult to execute out-of-order (OoO). To avoid unnecessary complexity, processors often serialize the pipeline to maintain sequential semantics for these instructions. We observe frequent SIs across several system-intensive workloads and three ISAs, SPARC V9, X86-64, and PowerPC. As explained by Amdahl's Law, these SIs, which create serial regions within the instruction-level parallel execution of a single thread, can have a significant impact on performance. For the SPARC ISA (after removing TLB and register window effects), we show that operating system (OS) code incurs a 8?45% performance drop from SIs. We observe that the values produced by most control register writes are quickly consumed, but the writes are often effectively useless (EU), i.e., they do not actually change the execution of the consuming instructions. We propose EU prediction, which allows younger i...
Philip M. Wells, Gurindar S. Sohi