Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning wher...
This paper proposes the integration of semantic information drawn from a web application’s domain knowledge into all phases of the web usage mining process (preprocessing, patte...
Order-preserving submatrixes (OPSMs) have been accepted as a biologically meaningful subspace cluster model, capturing the general tendency of gene expressions across a subset of ...
Byron J. Gao, Obi L. Griffith, Martin Ester, Steve...
We present a novel algorithm called Clicks, that finds clusters in categorical datasets based on a search for k-partite maximal cliques. Unlike previous methods, Clicks mines subs...
Mohammed Javeed Zaki, Markus Peters, Ira Assent, T...