Many different relative clustering validity criteria exist that are very useful in practice as quantitative measures for evaluating the quality of data partitions, and new criter...
Lucas Vendramin, Ricardo J. G. B. Campello, Eduard...
Abstract. This paper presents a language-independent Multilingual Document Clustering (MDC) approach on comparable corpora. Named entites (NEs) such as persons, locations, organiza...
Using gene expression data for cancer detection is one of the famous research topics in bioinformatics. Theoretically, gene expression data is capable to detect all types of early...
Larry T. H. Yu, Fu-Lai Chung, Stephen Chi-fai Chan...
Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document core...
Jian Huang 0002, Sarah M. Taylor, Jonathan L. Smit...
Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we i...
Hanna M. Wallach, Shane Jensen, Lee Dicker, Kather...