The evaluation of the quality of segmentations of an image, and the assessment of intra- and inter-expert variability in segmentation performance, has long been recognized as a dicult task. Recently an Expectation Maximization (EM) algorithm for Simultaneous Truth and Performance Level Estimation (Staple), was developed to compute both an estimate of the reference standard segmentation and performance parameters from a set of segmentations of an image. The performance is characterized by the rate of detection of each segmentation label by each expert in comparison to the estimated reference standard. This previous work provides estimates of performance parameters, but does not provide any information regarding their uncertainty. An estimate of this inferential uncertainty, if available, would allow estimation of condence intervals for the values of the parameters, aid in the interpretation of the performance of segmentation generators, and help determine if sucient data size and number...
Olivier Commowick, Simon K. Warfield