Haiku is a data mining system which combines the best properties of human and machine discovery. An self organising visualisation system is coupled with a genetic algorithm to provide an interactive, flexible system. Visualisation of data allows the human visual system to identify areas of interest, such as clusters, outliers or trends. A genetic algorithm based machine learning algorithm can then be used to explain the patterns identified visually. The explanations (in rule form) can be biased to be short or long; contain all the characteristics of a cluster or just those needed to predict membership; or concentrate on accuracy or on coverage of the data. This paper describes both the visualisation system and the machine learning component, with a focus on the interactive nature of the data mining process, and provides case studies to demonstrate the capabilities of the system.
Russell Beale, Andy Pryke, Robert J. Hendley