Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine ...
Efficient use of an optimized custom memory hierarchy to exploit temporal locality in the memory accesses on array signals can have a very large impact on the power consumption i...
Tanja Van Achteren, Rudy Lauwereins, Francky Catth...
Abstract-- Networks-on-Chip will serve as the central integration platform in future complex SoC designs, composed of a large number of heterogeneous processing resources. Most res...
The problem of writing high performance parallel applications becomes even more challenging when irregular, sparse or adaptive methods are employed. In this paper we introduce com...
This paper focuses on non-strict processing, optimization, and partial evaluation of MPI programs which use incremental data structures (ISs). We describe the design and implement...