Sciweavers

KDID
2003

A Framework for Frequent Sequence Mining under Generalized Regular Expression Constraints

14 years 5 days ago
A Framework for Frequent Sequence Mining under Generalized Regular Expression Constraints
This paper provides a framework for the extraction of frequent sequences satisfying a given regular expression (RE) constraint. We take advantage of the information contained in the hierarchical representation of an stract syntax trees (AST). Interestingly, pruning can be based on the anti-monotonicity of the minimal frequency constraint, but also on the RE constraint, even though this latter is generally not anti-monotonic. The AST representation enables to examine the decomposition the RE and to choose dynamically an adequate extraction method according to the local selectivity of the sub REs. Our algorithm, RE-Hackle, explores only the candidate space spanned over the regular expression, and prunes it at each level. Due to the dynamic choice of the exploration method, this algorithm surpasses its predecessors. We provide an experimental validation on both synthetic data and a real genomic sequence database. Furthermore, we show how this framework can be extended to regular expressio...
Hunor Albert-Lorincz, Jean-François Boulica
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where KDID
Authors Hunor Albert-Lorincz, Jean-François Boulicaut
Comments (0)