One fundamental task in near-neighbor search as well as other similarity matching efforts is to find a distance function that can efficiently quantify the similarity between two objects in a meaningful way. In DNA microarray analysis, the expression levels of two closely related genes may rise and fall synchronously in response to a set of experimental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very similar. Unfortunately, none of the conventional distance metrics such as the Lp norm can model this similarity effectively. In this paper, we study the near-neighbor search problem based on this new type of similarity. We propose to measure the distance between two genes by subspace pattern similarity, i.e., whether they exhibit a synchronous pattern of rise and fall on a subset of dimensions. We then present an efficient algorithm for subspace near-neighbor search based on pattern similarity distance, and we perform tests...
Haixun Wang, Jian Pei, Philip S. Yu