The primary objective of disparities research is to model the differences across multiple groups and identify the groups that behave significantly different from each other. Independently generating various decision trees for different subsets of the data will not allow us to study the impact of the various attributes on these different subgroups. We propose a novel technique for inducing similar decision trees for different subpopulations and also develop a new distance metric between two decision trees which measures the difference in the underlying data distributions of these subgroups. The proposed framework is evaluated by analyzing the racial disparities in breast cancer. Our method was able to rank different populations with respect to the disparity and detect the attributes that are most responsible for such differences.
Indranil Palit, Chandan K. Reddy, Kendra L. Schwar