This paper describes unsupervised speech/speaker cluster validity measures based on a dissimilarity metric, for the purpose of estimating the number of clusters in a speech data set as well as assessing the consistency of the clustering procedure. The number of clusters is estimated by minimizing the cross-data dissimilarity values, while algorithm consistency is evaluated by calculating the dissimilarity values across multiple experimental runs. The method is demonstrated on the task of Beluga whale vocalization clustering.
Kuntoro Adi, Kristine E. Sonstrom, Peter M. Scheif