We present an evaluation strategy for clock synchronization algorithms. It is based on a combination of measured traces, which provide for realistic performance estimation, and of simulation, which guarantees repeatability. The evaluation strategy includes parameteroptimization to allow for a fair comparison of algorithms; a generalpurpose evolutionary optimizer is used for this purpose. The strategy is applied in a case study, evaluating the performance of four clock synchronization algorithms in the wireless loudspeakers application. We find that the phase-locked loop algorithm, as well as the linear-regression and the gradient algorithm achieve sufficient synchronization in a lightly loaded network. Only the local selection algorithm is able to maintain sufficient synchronization under heavy network load, as generated for example by concurrent audio or video streaming.