In many data mining applications, online labeling feedback is only available for examples which were predicted to belong to the positive class. Such applications include spam filt...
Background: Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been deve...
We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models...
Abstract-- Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, outdated sources and sampling errors. Thes...
How to assess the performance of machine learning algorithms is a problem of increasing interest and urgency as the data mining application of myriad algorithms grows. The standard...