Wedescribe a case study in data miningfor personal loan evaluation, performed at the ABNAMRObank in the Netherlands. Historical data of clients and their pay-backbehaviourare used to learn to predict whethera client will default or not. It is shownthat, due to the pre-selection by a credit scoring system, the data base is a samplefroma different population than the bankis actuallyinterested in; this necessarily restricts inference as well. Furthermorewepoint out the importanceof integrity and consistency checking whenthe data are entered into the system:noise is a serious problem. Theactual experimentalcomparisoninvolves a "classical" statistical method,linear discriminant analysis, and the classification tree algorithmC4.5. Both methodsuse one and the sametraining set, drawn fromthe historical database, to learn a classification function. Thepercentagesof correct classifications on
A. J. Feelders, A. J. F. le Loux, J. W. van't Zand