Thoughit has been possible in the past to learn to predict DNAhydration patterns from crystallographic data, there is ambiguity in the choice of training data (both in terms of the relevant set of cases and the features needed to represent them), which limits the usefulness of standard learning techniques. Thus, we have developed a knowledge-based system to generate machine learning experiments for inducing DNAhydration pattern classifiers. The system takes as input (1) a set of classified training examplesdescribed by a large set of attributes and (2) information about a set of learning experiments that have already been run. It outputsa new learning experiment, namely a (not necessarily proper) subset of the input examples represented by a new set of features. Domainspecific and domain independent knowledge is used to suggest subsets of training examples from suspected subpopulations, transform attributes in the training data or generate newones, and choose interesting ways to subst...
Dawn M. Cohen, Casimir A. Kulikowski, Helen Berman