In many large e-commerce organizations, multiple data sources are often used to describe the same customers, thus it is important to consolidate data of multiple sources for intell...
In this paper, we describe the development of a fielded application for detecting malicious executables in the wild. We gathered 1971 benign and 1651 malicious executables and enc...
Inspired by co-training, many multi-view semi-supervised kernel methods implement the following idea: find a function in each of multiple Reproducing Kernel Hilbert Spaces (RKHSs)...
Despite popular belief, boosting algorithms and related coordinate descent methods are prone to overfitting. We derive modifications to AdaBoost and related gradient-based coordin...
This paper examines high dimensional regression with noise-contaminated input and output data. Goals of such learning problems include optimal prediction with noiseless query poin...