Mining Compressing Sequential Patterns

12 years 9 months ago

Download www.win.tue.nl

Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and belongs to the class of inapproximable problems. We propose two heuristic algorithms to mining compressing patterns. The ﬁrst uses a two-phase approach similar to Krimp for itemset data. To overcome performance with the required candidate generation we propose GoKrimp, an eﬀective greedy algorithm that directly mines compressing patterns. We conduct an empirical study on six real-life datasets to compare the proposed algorithms by run time, compressibility, and classiﬁcation accuracy using the patterns found as features for SVM classiﬁers.

Hoang Thanh Lam, Fabian Moerchen, Dmitriy Fradkin,

Real-time Traffic

Data Mining | Greedy Algorithm | Length Principle | Minimum Description Length | SDM 2012 |

claim paper

Post Info
More Details (n/a)

Added	29 Sep 2012
Updated	29 Sep 2012
Type	Journal
Year	2012
Where	SDM
Authors	Hoang Thanh Lam, Fabian Moerchen, Dmitriy Fradkin, Toon Calders

Comments (0)

Sciweavers

Mining Compressing Sequential Patterns

Data Mining | Greedy Algorithm | Length Principle | Minimum Description Length | SDM 2012 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers