Due in part to the large volume of data available today, but more importantly to privacy concerns, data are often distributed across institutional, geographical and organizational boundaries rather than being stored in a centralized location. Data can be distributed by separating objects or attributes: in the homogeneous case, sites contain subsets of objects with all attributes, while in the heterogeneous case sites contain subsets of attributes for all objects. Ensemble approaches combine the results obtained from a number of classifiers to obtain a final classification. In this paper, we present a novel ensemble approach, in which data is partitioned by attributes. We show that this method can successfully be applied to a wide range of data and can even produce an increase in classification accuracy compared to a centralized technique. As an ensemble approach, our technique exchanges models or classification results instead of raw data, which makes it suitable for privacy preservin...
Sabine M. McConnell, David B. Skillicorn