Data intensive applications, e.g. in life sciences, pose new efficiency challenges to the service composition problem. Since today computing power is mainly increased by multiplication of CPU cores, algorithms have to be redesigned to benefit from this evolution. In this paper we present a framework for parallelizing service composition algorithms investigating how to partition the composition problem into multiple parallel threads. But in contrast to intuition, the straightforward parallelization techniques do not lead to superior performance as our baseline evaluation reveals. To harness the full power of multi-core architectures, we propose two novel approaches to evenly distribute the workload in a sophisticated fashion. In fact, our extensive experiments on practical life science data resulted in an impressive speedup of over 300% using only 4 cores. Moreover, we show that our techniques can also benefit from all advanced pruning heuristics used in sequential algorithms. Keywords