Order-preserving submatrixes (OPSMs) have been accepted as a biologically meaningful subspace cluster model, capturing the general tendency of gene expressions across a subset of ...
Byron J. Gao, Obi L. Griffith, Martin Ester, Steve...
Localized search engines are small-scale systems that index a particular community on the web. They offer several benefits over their large-scale counterparts in that they are rel...
Typically, data collected by a spacecraft is downlinked to Earth and pre-processed before any analysis is performed. We have developed classifiers that can be used onboard a space...
Ashley Davies, Benjamin Cichy, Dominic Mazzoni, Ng...
To learn concepts over massive data streams, it is essential to design inference and learning methods that operate in real time with limited memory. Online learning methods such a...
We study the mining of interesting patterns in the presence of numerical attributes. Instead of the usual discretization methods, we propose the use of rank based measures to scor...
Collaborative recommender systems are highly vulnerable to attack. Attackers can use automated means to inject a large number of biased profiles into such a system, resulting in r...
Robin D. Burke, Bamshad Mobasher, Chad Williams, R...
In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset minin...
Often the best performing supervised learning models are ensembles of hundreds or thousands of base-level classifiers. Unfortunately, the space required to store this many classif...
Cristian Bucila, Rich Caruana, Alexandru Niculescu...
The output of a data mining algorithm is only as good as its inputs, and individuals are often unwilling to provide accurate data about sensitive topics such as medical history an...
This paper describes a novel classification method for computer aided detection (CAD) that identifies structures of interest from medical images. CAD problems are challenging larg...