Blocked-execution multiprocessor architectures continuously run atomic blocks of instructions — also called Chunks. Such architectures can boost both performance and software pr...
The performance attained by parallel programs executed on multiprocessor systems is largely in uenced both by the characteristics of the code and by those of the system architectu...
Emergence of chip multiprocessor systems has dramatically increased the performance potential of computer systems. Since the amount of exploited parallelism is directly influenced ...
This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
Shared memory is an appealing abstraction for parallel programming. It must be implemented with caches in order toperform well, however, and caches require a coherence mechanism t...