When a partitional structure is derived from a data set using a data mining algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of...
This paper presents a general framework for adapting any generative (model-based) clustering algorithm to provide balanced solutions, i.e., clusters of comparable sizes. Partition...
Most cost function based clustering or partitioning methods measure the compactness of groups of data. In contrast to this picture of a point source in feature space, some data sou...
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to ge...
—In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters of Dirichlet process. However, this kind of models usually...