Clustering for better representation of the diversity of text or image search results has been studied extensively. In this paper, we extend this methodology to the novel domain of music search. We conduct empirical evaluation of different clustering algorithms, audio feature representations, and the incorporation of lyrics for music clustering. Our evaluation shows the fusion of audio and text features yields the best clustering accuracy.
Yi-Hsuan Yang, Yu-Ching Lin, Homer H. Chen