This paper presents COBRA (Continuous Binary ReAdaptation), a runtime binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based...
Transactional Memory (TM) simplifies parallel programming by supporting atomic and isolated execution of user-identified tasks. To date, TM programming has required the use of l...
Woongki Baek, Chi Cao Minh, Martin Trautmann, Chri...
OpenMP allows programmers to specify nested parallelism in parallel applications. In the case of scientific applications, parallel loops are the most important source of paralleli...
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
Abstract. We describe compiler and run-time optimisations for effective autoparallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by anno...