A virtual instruction set architecture (V-ISA) implemented via a processor-specific software translation layer can provide great flexibility to processor designers. Recent examp...
Vikram S. Adve, Chris Lattner, Michael Brukman, An...
We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operation...
Andreas Moshovos, Dionisios N. Pnevmatikatos, Amir...
Abstract—Many studies have shown that load imbalancing causes significant performance degradation in High Performance Computing (HPC) applications. Nowadays, Multi-Threaded (MT1...
Carlos Boneti, Roberto Gioiosa, Francisco J. Cazor...
The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the per...
Distributed processors must balance communication and concurrency. When dividing instructions among the processors, key factors are the available concurrency, criticality of depen...
Behnam Robatmili, Katherine E. Coons, Doug Burger,...