Logical entity recognition in heterogeneous collections of document page images remains a challenging problem since the performance of traditional supervised methods degrade drama...
We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between...
An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliogra...
Abstract. Clustering is often considered the most important unsupervised learning problem and several clustering algorithms have been proposed over the years. Many of these algorit...
We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...