In this work we investigate how the compiler technique of message strip mining performs in practice on contemporary high performance networks. Message strip mining attempts to redu...
We present preliminary results of a project to create a tuning system that adaptively optimizes programs to the underlying execution platform. We will show initial results from tw...
Abstract— We developed an automated environment to measure the memory access behavior of applications on high performance clusters. Code optimization for processor caches is cruc...
Orca is a portable, object-based distributed shared memory system. This paper studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. T...
Henri E. Bal, Raoul Bhoedjang, Rutger F. H. Hofman...
We consider the filter decomposition problem in supporting coarse-grained pipelined parallelism. This form of parallelism is suitable for data-driven applications in scenarios wh...