We present a parallel data processor centered around a programming model of so called Parallelization Contracts (PACTs) and the scalable parallel execution engine Nephele [18]. Th...
Future CMPs will combine many simple cores with deep cache hierarchies. With more cores, cache resources per core are fewer, and must be shared carefully to avoid poor utilization...
Junli Gu, Steven S. Lumetta, Rakesh Kumar, Yihe Su...
Buffered CoScheduled (BCS) MPI is a novel implementation of MPI based on global synchronization of all system activities. BCS-MPI imposes a model where all processes and their com...
This paper compares several implementations of entry consistency (EC) and lazy release consistency (LRC), two relaxed memory models in use with software distributed shared memory ...
Sarita V. Adve, Alan L. Cox, Sandhya Dwarkadas, Ra...
With current FPGAs, designers can now instantiate several embedded processors, memory units, and a wide variety of IP blocks to build a single-chip, high-performance multiprocesso...