Energy consumption by computer systems has emerged as an important concern. However, the energy consumed in executing an algorithm cannot be inferred from its performance alone: i...
Multiprocessors based on processors with multiple cores usually include a non-uniform memory architecture (NUMA); even current 2-processor systems with 8 cores exhibit non-uniform...
Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor ...
—With the appearance of massively parallel and inexpensive platforms such as the G80 generation of NVIDIA GPUs, more real-life applications will be designed or ported to these pl...
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...