Recent results seem to cast some doubt over the assumption that improvements in fused recognition accuracy for speaker recognition systems based on different acoustic features are due mainly to the different origins of the features (e.g. magnitude, phase, modulation information). In this study, we utilize clustering comparison measures to investigate acoustic and speaker modelling aspects of the speaker recognition task separately and demonstrate that front-end diversity can be achieved purely through different ‘partitioning’ of the acoustic space. Further, features that exhibit good ‘stability’ with respect to repeated clustering are shown to also give good EER performance in speaker recognition. This has implications for feature choice, fusion of systems employing different features, and for UBM data selection. A method for the latter problem is presented that gives up to an 11% relative reduction in EER using only 20-30% of the usual UBM training data set.