The k-nearest-neighbor rule is one of the most attractive pattern classification algorithms. In practice, the choice of k is determined by the cross-validation method. In this work, we propose a new method for neighborhood size selection that is based on the concept of statistical confidence. We define the confidence associated with a decision that is made by the majority rule from a finite number of observations and use it as a criterion to determine the number of nearest neighbors needed. The new algorithm is tested on several real-world datasets and yields results comparable to the k-nearest-neighbor rule. However, in contrast to the k-nearest-neighbor rule that uses a fixed number of nearest neighbors throughout the feature space, our method locally adjusts the number of nearest neighbors until a satisfactory level of confidence is reached. In addition, the statistical confidence provides a natural way to balance the trade-off between the reject rate and the error rate by excludin...
Jigang Wang, Predrag Neskovic, Leon N. Cooper