Abstract. We present a strategy to develop, in a functional setting, correct, e cient and portable Divide-and-Conquer (DC) programs for massively parallel architectures. Starting from an operational DC program, mapping sequences to sequences, we apply a set of semantics preserving transformation rules, which transform the parallel control structure of DC into a sequential control ow, thereby making the implicit data parallelism in a DC scheme explicit. In the next phase of our strategy, the parallel architecture is fully expressed, where `architecture dependent' higher-order functions are introduced. Then { due to the rising communication complexities on particular architectures { topology dependent communication patterns are optimized in order to reduce the overall communication costs. The advantages of this approach are manifold and are demonstrated with a set of non-trivial examples.