The k-means algorithm is a popular clustering method used in many different fields of computer science, such as data mining, machine learning and information retrieval. However, the k-means algorithm is very likely to converge to some local optimum which is much worse than the desired global optimal solution. To overcome this problem, current k-means algorithm and its variants usually run many times with different initial centers to avoid being trapped in local optimums that are of unacceptable quality. In this paper, we propose an efficient method to compute a lower bound on the cost of the local optimum from the current center set. After every k-means iteration, k-means algorithm can halt the procedure if the lower bound of the cost at the future local optimum is worse than the best solution that has already been computed so far. Although such a lower bound computation incurs some extra time consumption in the iterations, extensive experiments on both synthetic and real data sets ...
Zhenjie Zhang, Bing Tian Dai, Anthony K. H. Tung