Identification of all objects in a dataset whose similarity is not less than a specified threshold is of major importance for management, search, and analysis of data. Set similari...
We consider the problem of obtaining a reduced dimension representation of electropalatographic (EPG) data. An unsupervised learning approach based on latent variable modelling is...
Low-Complexity Regions (LCRs) of biological sequences are the main source of false positives in similarity searches for biological sequence databases. We consider the problem of ...
When searching databases of nucleotide or protein sequences, finding a local alignment of two sequences is one of the main tasks. Since the sizes of available databases grow const...
In this paper we give approximation algorithms for several proximity problems in high dimensional spaces. In particular, we give the rst Las Vegas data structure for (1 + )-neares...