Many programs exploit shared-memory parallelism using multithreading. Threaded codes typically use locks to coordinate access to shared data. In many cases, contention for locks r...
Nathan R. Tallent, John M. Mellor-Crummey, Allan P...
In the search for high performance, most transactional memory (TM) systems execute atomic blocks concurrently and must thus be prepared for data conflicts. The TM system must then...
Future CMPs will combine many simple cores with deep cache hierarchies. With more cores, cache resources per core are fewer, and must be shared carefully to avoid poor utilization...
Junli Gu, Steven S. Lumetta, Rakesh Kumar, Yihe Su...
hyperobjects (reducers) provide a linguistic abstraction for dynamic multithreading that allows different branches of a parallel program to maintain coordinated local views of the...
I.-Ting Angelina Lee, Aamir Shafi, Charles E. Leis...
This paper focuses on generating efficient software pipelined schedules for in-order machines, which we call Converged Trace Schedules. For a candidate loop, we form a string of t...