Clustering is one of the most important tasks performed in Data Mining applications. This paper presents an e cient SQL implementation of the EM algorithm to perform clustering in...
High dimensional directional data is becoming increasingly important in contemporary applications such as analysis of text and gene-expression data. A natural model for multivaria...
Arindam Banerjee, Inderjit S. Dhillon, Joydeep Gho...
Using gene expression data for cancer detection is one of the famous research topics in bioinformatics. Theoretically, gene expression data is capable to detect all types of early...
Larry T. H. Yu, Fu-Lai Chung, Stephen Chi-fai Chan...
Microarray datasets are often too large to visualise due to the high dimensionality. The self-organising map has been found useful to analyse massive complex datasets. It can be us...
A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, ...