High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets

16 years 20 days ago

Download www.cs.columbia.edu

High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to improve the efficiency of hierarchical document clustering. In this paper, we introduce the notion of “closed interesting” itemsets (i.e. closed itemsets with high interestingness). We provide heuristics such as “super item” to efficiently mine these itemsets and show that they provide significant dimensionality reduction over closed frequent itemsets. Using “closed interesting” itemsets, we propose a new hierarchical document clustering method that outperforms state of the art agglomerative, partitioning and frequent-itemset based methods both in terms of FScore and Entropy, without requiring dataset specific parameter tuning. We evaluate twenty interestingness measures on nine standard datasets and show that when used to generate “closed interesting” itemsets, and to select parent nodes, Mu...

Hassan H. Malik, John R. Kender

Real-time Traffic

Closed Frequent Itemsets | Data Mining | Document Clustering | Hierarchical Document Clustering | ICDM 2006 |

claim paper

Added	11 Jun 2010
Updated	11 Jun 2010
Type	Conference
Year	2006
Where	ICDM
Authors	Hassan H. Malik, John R. Kender

Sciweavers

High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets

Closed Frequent Itemsets | Data Mining | Document Clustering | Hierarchical Document Clustering | ICDM 2006 |

Explore & Download

Productivity Tools

Sciweavers