Accurate, reproducible and comparable measurement of the overheads, communication times and progression behavior of blocking and nonblocking collective operations is a complicated task. Although Different measurement schemes for blocking collective operations are implemented in well-known benchmarks, many of these schemes introduce different systematic errors in their measurements. We characterize these errors and select a window-based approach as the most accurate method. However, this approach complicates measurements significantly and introduces clock synchronization as a new source of errors. We analyze approaches to avoid or correct those errors and develop a scalable synchronization scheme to conduct benchmarks on massively parallel systems. Our results are compared to the window-based scheme implemented in the SKaMPI benchmarks and show a reduction of the synchronization overhead by a factor of 16 on 128 processes. We also describe two different measurement schemes for the ove...