Using Category-Based Adherence to Cluster Market-Basket Data

15 years 11 months ago

Download arbor.ee.ntu.edu.tw

In this paper, we devise an efﬁcient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality, sparsity, and with massive outliers. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. The distance of an item to a given cluster is deﬁned as the number of links between this item and its nearest large node in the taxonomy tree where a large node is an item (i.e., leaf) or a category (i.e., internal) node whose occurrence count exceeds a given threshold. The category-based adherence of a transaction to a cluster is then deﬁned as the average di...

Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen

Real-time Traffic