In this paper, we introduce PTLsim, a cycle accurate full system x86-64 microprocessor simulator and virtual machine. PTLsim models a modern superscalar out of order x86-64 proces...
Today’s chip-level multiprocessors (CMPs) feature up to a hundred discrete cores, and with increasing levels of integration, CMPs with hundreds of cores, cache tiles, and specia...
Boris Grot, Joel Hestness, Stephen W. Keckler, Onu...
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Reconfigurable Processors utilize a reconfigurable fabric (to implement application-specific accelerators) and may perform runtime reconfigurations to exchange the set of deployed...
As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network des...