Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, ...
Three-dimensional integration has the potential to improve the communication latency and integration density of chip-level multiprocessors (CMPs). However, the stacked highpower de...
Changyun Zhu, Zhenyu (Peter) Gu, Li Shang, Robert ...
The availability of large quantities of processors is a crucial enabler of many-task computing. Voluntary computing systems have proven that it is possible to build computing plat...
Rostand Costa, Francisco V. Brasileiro, Guido Lemo...
Instruction combining is an optimization to replace a sequence of instructions with a more efficient instruction yielding the same result in a fewer machine cycles. When we use it...
While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on mu...
Pablo Montesinos, Matthew Hicks, Samuel T. King, J...