Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describe just such an implementation, and quantitatively explore some of the complex design tradeoffs that make such implementations hard to build. Our measurements are necessarily detailed and specific, but they are reproducible, and we believe that they offer some general insights. Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications--Applicative (functional) languages; D.3.2 [Programming Languages]: Language Classifications--Concurrent, distributed and parallel languages; D.3.3 [Programming Languages]: Language Constructs and Features-Concurrent programming structures; D.3.4 [Programming Languages]: Processors--Runtime-environments General Terms Languages, ...
Simon Marlow, Simon L. Peyton Jones, Satnam Singh