This paper shows how a text classifier's need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents. We modify the Query-by...
Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document cluste...
1 Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and t...
Abstract. A major problem encountered by text clustering practitioners is the difficulty of determining a priori which is the optimal text representation and clustering technique f...
We study methods to initialize or bias different clustering methods using prior information about the "importance" of a keyword w.r.t. the whole document collection or a...