Data prefetching technique is widely used to bridge the growing performance gap between processor and memory. Numerous prefetching techniques have been proposed to exploit data pa...
OpenMP has emerged as a widely accepted standard for writing shared memory programs. Hardware-specific extensions such as data placement are usually needed to improve the scalabi...
This paper presents the first analysis of operating system execution on a simultaneous multithreaded (SMT) processor. While SMT has been studied extensively over the past 6 years,...
It has been shown that a small number of FPGAs can significantly accelerate certain computing tasks by up to two or three orders of magnitude. However, particularly intensive lar...
Arun Patel, Christopher A. Madill, Manuel Salda&nt...
As we move to large manycores, the hardware-based global checkpointing schemes that have been proposed for small shared-memory machines do not scale. Scalability barriers include ...