Sciweavers

ISBRA
2007
Springer

Clustering Algorithms Optimizer: A Framework for Large Datasets

14 years 5 months ago
Clustering Algorithms Optimizer: A Framework for Large Datasets
Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a datadriven framework that includes two interrelated steps. The first one is SVDbased dimension reduction and the second is an automated tuning of the algorithm’s parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In...
Roy Varshavsky, David Horn, Michal Linial
Added 08 Jun 2010
Updated 08 Jun 2010
Type Conference
Year 2007
Where ISBRA
Authors Roy Varshavsky, David Horn, Michal Linial
Comments (0)