Web data integration is an important preprocessing step for web mining. It is highly likely that several records on the web whose textual representations differ may represent the ...
A similarity join correlating fragments in XML documents, which are similar in structure and content, can be used as the core algorithm to support data cleaning and data integratio...
The dimensionality curse has profound e ects on the effectiveness of high-dimensional similarity indexing from the performance perspective. One of the well known techniques for im...
Abstract. In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent an important component of multimedia database systems. Since it is ver...
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has ...