Middleware for web service orchestration, such as runtime engines for executing business processes, workflows, or web service compositions, can easily become performance bottlenecks when the number of concurrent service requests increases. Many existing process execution engines have been designed to address scalability with distribution and replication techniques. However, the advent of modern multicore machines, comprising several chip multiprocessors each offering multiple cores and often featuring a large shared cache, offers the opportunity to redesign the architecture of process execution engines in order to take full advantage of the underlying hardware resources. In this paper we present an innovative process execution engine architecture. Its design takes into account the specific constraints of multicore machines and scales well on different processor architectures, as shown by our extensive performance evaluation. A key feature of the design is selfconfiguration at startup a...