Automatic localisation of correspondences for the construction of Statistical Shape Models from examples has been the focus of intense research during the last decade. Several algorithms are available and benchmarking is needed to rank the different algorithms. Prior work has argued that the quality of the models produced by the algorithms can be evaluated by measuring compactness, generality and specificity. In this paper severe problems with these standard measures are analysed both theoretically and experimentally both on natural and synthetic datasets. We also propose that a Ground Truth Correspondence Measure (GCM) is used for benchmarking and in this paper benchmarking is performed on several state of the art algorithms using seven real and one synthetic dataset. 1 Background Statistical shape modeling [11] has turned out to be a very effective tool in image segmentation and image interpretation. A major drawback is that a dense correspondence between the shapes in the training...