Probabilistic data have recently become popular in applications such as scientific and geospatial databases. For images and other spatial datasets, probabilistic values can capture the uncertainty in extent and class of the objects in the images. Relating one such dataset to another by spatial joins is an important operation for data management systems. We consider probabilistic spatial join (PSJ) queries, which rank the results according to a score that incorporates both the uncertainties associated with the objects and the distances between them. We present algorithms for two kinds of PSJ queries: Threshold PSJ queries, which return all pairs that score above a given threshold, and top-k PSJ queries, which return the k top-scoring pairs. For threshold PSJ queries, we propose a plane sweep algorithm that, because it exploits the special structure of the problem, runs in O(n (log n + k)) time, where n is the number of points and k is the number of results. We extend the algorithms to 2...
Vebjorn Ljosa, Ambuj K. Singh