Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - ...
This paper proposes an efficient relevance feedback based interactive model for keyword generation in sponsored search advertising. We formulate the ranking of relevant terms as a...
act 11 We describe an ensemble approach to learning from arbitrarily partitioned data. The partitioning comes from the distributed process12 ing requirements of a large scale simul...
Larry Shoemaker, Robert E. Banfield, Lawrence O. H...
We introduce a Bayesian model, BayesANIL, that is capable of estimating uncertainties associated with the labeling process. Given a labeled or partially labeled training corpus of...
Deduplication, a key operation in integrating data from multiple sources, is a time-consuming, labor-intensive and domainspecific operation. We present our design of alias that us...