Active shape models are a powerful and widely used tool to interpret complex image data. By building models of shape variation they enable search algorithms to use a priori knowledge in an efficient and gainful way. However, due to the linearity of PCA, non-linearities like rotations or independently moving subparts in the data can deteriorate the resulting model considerably. Although non-linear extensions of active shape models have been proposed and application specific solutions have been used, they still need a certain amount of user interaction during model building. In this paper the task of building/choosing optimal models is tackled in a more generic information theoretic fashion. In particular, we propose an algorithm based on the minimum description length principle to find an optimal subdivision of the data into subparts, each adequate for linear modeling. This results in an over all more compact model configuration. Which in turn leads to a better model in terms of modes ...