Multiprocessor architectures that continuously execute atomic blocks (or chunks) of instructions can improve performance and software productivity. However, all of the prior propo...
The least recently used (LRU) replacement policy performs poorly in the last-level cache (LLC) because temporal locality of memory accesses is filtered by first and second level...
Samira Manabi Khan, Zhe Wang, Daniel A. Jimé...
Conventional out-of-order processors that use a unified physical register file allocate and reclaim registers explicitly using a free list that operates as a circular queue. We ...
Steven Battle, Andrew D. Hilton, Mark Hempstead, A...
Data races are a major contributor to parallel software unreliability. A type of race that is both common and typically harmful is the Asymmetric data race. It occurs when at leas...
Modern memory systems rely on spatial locality to provide high bandwidth while minimizing memory device power and cost. The trend of increasing the number of cores that share memo...
Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Mike Sul...