In high dimensional data sets not all dimensions contain an equal amount of information and most of the time global features are more important than local differences. This makes ...
Advanced Data Mining applications require more and more support from relational database engines. Especially clustering applications in high dimensional features space demand a pr...
Cloud enabled systems have become a crucial component to efficiently process and analyze massive amounts of data. One of the key data processing and analysis operations is the Sim...
Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generali...
Data cleaning based on similarities involves identification of "close" tuples, where closeness is evaluated using a variety of similarity functions chosen to suit the do...