Summarising Data by Clustering Items

14 years 5 months ago

Download win.ua.ac.be

Abstract. For a book, the title and abstract provide a good ﬁrst impression of what to expect from it. For a database, getting a ﬁrst impression is not so straightforward. While low-order statistics only provide limited insight, mining the data quickly provides too much detail. In this paper we propose a middle ground, and introduce a parameter-free method for constructing high-quality summaries for binary data. Our method builds a summary by grouping items that strongly correlate, and uses the Minimum Description Length principle to identify the best grouping —without requiring a distance measure between items. Besides oﬀering a practical overview of which attributes interact most strongly, these summaries are also easily-queried surrogates for the data. Experiments show that our method discovers high-quality results: correlated attributes are correctly grouped and the supports of frequent itemsets are closely approximated.

Michael Mampaey, Jilles Vreeken

Real-time Traffic

Correlated Attributes | Data Mining | Minimum Description Length | PKDD 2010 | ﬁrst Impression |

claim paper

Post Info
More Details (n/a)

Added	29 Jan 2011
Updated	29 Jan 2011
Type	Journal
Year	2010
Where	PKDD
Authors	Michael Mampaey, Jilles Vreeken

Comments (0)

Sciweavers

Summarising Data by Clustering Items

Correlated Attributes | Data Mining | Minimum Description Length | PKDD 2010 | ﬁrst Impression |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers