Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
Background: Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the under...
One viewpoint of a knowledge network is a knowledge map that clusters similar knowledge sources into knowledge domains. What is needed is an automatic mapping tool that 1) takes t...
In this paper, we propose a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Our approach has three unique features. First, we use the c...
Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of ...