In many important text classification problems, acquiring class labels for training documents is costly, while gathering large quantities of unlabeled data is cheap. This paper sh...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...
Abstract. Most classification methods assume that the samples are drawn independently and identically from an unknown data generating distribution, yet this assumption is violated ...
Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...
In machine learning problems with tens of thousands of features and only dozens or hundreds of independent training examples, dimensionality reduction is essential for good learni...
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally repres...
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hal...