We present an improved bound on the difference between training and test errors for voting classifiers. This improved averaging bound provides a theoretical justification for popu...
Protein secondary structure prediction and high-throughput drug screen data mining are two important applications in bioinformatics. The data is represented in sparse feature spac...
Steven Eschrich, Nitesh V. Chawla, Lawrence O. Hal...
Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly ...
Data quality is critical for many information-intensive applications. One of the best opportunities to improve data quality is during entry. USHER provides a theoretical, data-dri...
Kuang Chen, Joseph M. Hellerstein, Tapan S. Parikh
Predictive models developed by applying Data Mining techniques are used to improve forecasting accuracy in the airline business. In order to maximize the revenue on a flight, the ...