Parallel programs are known to be difficult to analyze. A key reason is that they typically have an enormous number of execution interleavings, or schedules. Static analysis over...
Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfen...
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural m...
Lori Carter, Beth Simon, Brad Calder, Larry Carter...
—Performance degradation of memory-intensive programs caused by the LRU policy’s inability to handle weaklocality data accesses in the last level cache is increasingly serious ...
Composable multicore systems merge multiple independent cores for running sequential single-threaded workloads. The performance scalability of these systems, however, is limited d...
Behnam Robatmili, Madhu Saravana Sibi Govindan, Do...
As semiconductor feature sizes decrease, interconnect delay is becoming a dominant component of processor cycle times. This creates a critical need to shift microarchitectural des...