Abstract. Nested data-parallel programs often have large memory requirements due to their high degree of parallelism. Piecewise execution is an implementation technique used to minimize the space needed. In this paper, we present a combinination of piecewise execution and loop-fusion techniques. Both a formal framework and the execution model based on threads are presented. We give some experimental results, which demonstrate the good performance in memoryconsumption and execution time.
W. Pfannenstiel