We present a multi-dimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequen...
Join techniques deploying approximate match predicates are fundamental data cleaning operations. A variety of predicates have been utilized to quantify approximate match in such o...
Sudipto Guha, Nick Koudas, Divesh Srivastava, Xiao...
Similarity search and similarity join on strings are important for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences....
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
We suggest a local hybrid approximation scheme based on polynomials and radial basis functions, and use it to modify the scattered data fitting algorithm of [7]. Similar to that a...