We consider the general, widely applicable problem of selecting from n real-valued random variables a subset of size m of those with the highest means, based on as few samples as ...
The large number of genes and the relatively small number of samples are typical characteristics for microarray data. These characteristics pose challenges for both sample classif...
Any large language processing software relies in its operation on heuristic decisions concerning the strategy of processing. These decisions are usually "hard-wired" int...
In this paper, we present a new rule induction algorithm for machine learning in medical diagnosis. Medical datasets, as many other real-world datasets, exhibit an imbalanced clas...
Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. Methods like Maximum Entropy and Conditional Random Fields make use of fe...