Sciweavers

1038 search results - page 14 / 208
» A Genetic Algorithm for Clustering on Very Large Data Sets
Sort
View
GECCO
2006
Springer
186views Optimization» more  GECCO 2006»
14 years 7 days ago
Characterizing large text corpora using a maximum variation sampling genetic algorithm
An enormous amount of information available via the Internet exists. Much of this data is in the form of text-based documents. These documents cover a variety of topics that are v...
Robert M. Patton, Thomas E. Potok
KES
2008
Springer
13 years 8 months ago
An Algorithm to Assess the Reliability of Hierarchical Clusters in Gene Expression Data
The validation of clusters discovered in bio-molecular data is a central issue in bioinformatics. Recently, stability-based methods have been successfully applied to the analysis o...
Roberto Avogadri, Matteo Brioschi, Francesca Ruffi...
KDD
2000
ACM
149views Data Mining» more  KDD 2000»
14 years 4 days ago
Efficient clustering of high-dimensional data sets with application to reference matching
Many important problems involve clustering large datasets. Although naive implementations of clustering are computationally expensive, there are established efficient techniques f...
Andrew McCallum, Kamal Nigam, Lyle H. Ungar
SIGMOD
2001
ACM
200views Database» more  SIGMOD 2001»
14 years 8 months ago
Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering
In this paper, we investigate how to scale hierarchical clustering methods (such as OPTICS) to extremely large databases by utilizing data compression methods (such as BIRCH or ra...
Markus M. Breunig, Hans-Peter Kriegel, Peer Kr&oum...
VLDB
2005
ACM
118views Database» more  VLDB 2005»
14 years 2 months ago
Selectivity Estimation for Fuzzy String Predicates in Large Data Sets
Many database applications have the emerging need to support fuzzy queries that ask for strings that are similar to a given string, such as “name similar to smith” and “tele...
Liang Jin, Chen Li