A distributed memory parallel version of the group average Hierarchical Agglomerative Clustering algorithm is proposed to enable scaling the document clustering problem to large c...
Rebecca Cathey, Eric C. Jensen, Steven M. Beitzel,...
In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many real...
Abstract. We have found that the nearest neighbor (NN) test is an insufficient measure of the cluster hypothesis. The NN test is a local measure of the cluster hypothesis. Designer...
Abstract—Key program interfaces are sometimes documented with usage examples: concrete code snippets that characterize common use cases for a particular data type. While such doc...
Text clustering is most commonly treated as a fully automated task without user feedback. However, a variety of researchers have explored mixed-initiative clustering methods which...