Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

173

CAINE
2003

128views Computer Science» more CAINE 2003»

A Genetic Algorithm for Clustering on Very Large Data Sets

15 years 8 months ago

A Genetic Algorithm for Clustering on Very Large Data Sets

Download cs.hbg.psu.edu

Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets. The genetic algorithm uses the most time efficient traditional techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.

Jim Gasvoda, Qin Ding

Real-time Traffic

Algorithm | CAINE 2003 | CAINE 2007 | Data Sets | Genetic Algorithm |

claim paper

Related Content

» A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining

» DESCRY A Density Based Clustering Algorithm for Very Large Data Sets

» A Genetic Algorithm Using HyperQuadtrees for LowDimensional Kmeans Clustering

» OCluster Scalable Clustering of Large High Dimensional Data Sets

» New Parallel Algorithms for Frequent Itemset Mining in Very Large Databases

» Very large scale ReliefF for genomewide association analysis

» A parallel distributed algorithm for relational frequent pattern discovery from very large...

» SyMP an efficient clustering approach to identify clusters of arbitrary shapes in large da...

» TrajStore An Adaptive Storage System for Very Large Trajectory Data Sets

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	CAINE
Authors	Jim Gasvoda, Qin Ding

Comments (0)