Rearranging data objects for efficient and stable clustering

15 years 17 days ago

Download www.cs.uvm.edu

When a partitional structure is derived from a data set using a data mining algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. To overcome this problem, the first clustering process proceeds to construct an initial partition. The partition is expected to imply the possible range in the number of final clusters. We apply center sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm, REIT, that achieves both efficiency and reliability. A number of experiments have been performed to show that the algorithm helps minimize the order bias effects. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning – concept learning. General Terms Algorithms, Performance, Experimentation. Keywords Data ord...

Gyesung Lee, Xindong Wu, Jinho Chon

Real-time Traffic