LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern

16 years 4 days ago

Download www.tkl.iis.u-tokyo.ac.jp

Sequence pattern mining is an important research problem because it is the basis of many other applications. Yet how to efﬁciently implement the mining is difﬁcult due to the inherent characteristic of the problem - the large size of the data set. In this paper, by combining SPAM, we propose a new algorithm called LAst Position INduction Sequential PAttern Mining (abbreviated as LAPIN-SPAM), which can efﬁciently get all the frequent sequential patterns from a large database. The main difference between our strategy and the previous works is that when judging whether a sequence is a pattern or not, they use S-Matrix by scanning projected database (PreﬁxSpan) or count the number by joining (SPADE) or ANDing with the candidate item (SPAM). In contrast, LAPIN-SPAM can easily implement this process based on the following fact - if an item’s last position is smaller than the current preﬁx position, the item can not appear behind the current preﬁx in the same customer sequence....

Zhenglu Yang, Masaru Kitsuregawa

Real-time Traffic