We argue in this paper that benchmarking should be complemented by direct measurement of parallelisation overheads when evaluating parallel state-space exploration algorithms. This poses several challenges that so far have not been addressed in the literature: what exactly are those overheads, how can and cannot they be measured, and how should system models be selected in order to expose the causes of parallelisation (in)efficiencies? We discuss and answer these questions based on our experience with parallelising Saturation