Carpenter: finding closed patterns in long biological datasets

16 years 7 months ago

Download www.cs.rpi.edu

The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000100,000 columns but only 100-1000 rows. Such datasets pose a great challenge for existing (closed) frequent pattern discovery algorithms, since they have an exponential dependence on the average row length. In this paper, we describe a new algorithm called CARPENTER that is specially designed to handle datasets having a large number of attributes and relatively small number of rows. Several experiments on real bioinformatics datasets show that CARPENTER is orders of magnitude better than previous closed pattern mining algorithms like CLOSET and CHARM.

Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang

Real-time Traffic

Data Mining | Gene Expression Datasets | KDD 2003 | Real Bioinformatics Datasets | Such Datasets |

claim paper

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2003
Where	KDD
Authors	Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang, Mohammed Javeed Zaki

Sciweavers

Carpenter: finding closed patterns in long biological datasets

Data Mining | Gene Expression Datasets | KDD 2003 | Real Bioinformatics Datasets | Such Datasets |

Explore & Download

Productivity Tools

Sciweavers