Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information, which originates from measurement/quantisation errors, data staleness, multiple repeated measurements, etc. The value uncertainty is represented by multiple values forming a probability distribution function (pdf). We discover that the accuracy of a decision tree classifier can be much improved if the whole pdf, rather than a simple statistic, is taken into account. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Since processing pdf's is computationally more costly, we propose a series of pruning techniques that can greatly improve the efficiency of the construction of decision trees.
Smith Tsang, Ben Kao, Kevin Y. Yip, Wai-Shing Ho,