Sciweavers

KDD
2001
ACM

Efficient discovery of error-tolerant frequent itemsets in high dimensions

15 years 25 days ago
Efficient discovery of error-tolerant frequent itemsets in high dimensions
We present a generalization of frequent itemsets allowing the notion of errors in the itemset definition. We motivate the problem and present an efficient algorithm that identifies error-tolerant frequent clusters of items in transactional data (customer-purchase data, web browsing data, text, etc.). This efficient algorithm exploits sparsity of the underlying data to find large groups of items that are correlated over database records (rows). The notion of transaction coverage allows us to extend the algorithm and view it as a fast clustering algorithm for discovering segments of similar transactions in binary sparse data. We evaluate the new algorithm on three real-world applications: clustering high-dimensional data, query selectivity estimation and collaborative filtering. Results show that we consistently uncover structure in large sparse databases that other more traditional clustering algorithms in data mining fail to find. 26th International Conference on Very Large Databases,...
Cheng Yang, Usama M. Fayyad, Paul S. Bradley
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2001
Where KDD
Authors Cheng Yang, Usama M. Fayyad, Paul S. Bradley
Comments (0)