A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has ...
Abstract. Clustering has become an increasingly important task in modern application domains. In many areas, e.g. when clustering complex objects, in distributed clustering, or whe...
In data integration applications, a join matches elements that are common to two data sources. Often, however, elements are represented slightly different in each source, so an app...
Arc-annotated sequences are useful in representing the structural information of RNA sequences. In general, RNA secondary and tertiary structures can be represented as a set of ne...
We give near-tight bounds for estimating the edit distance between two non-repetitive strings (Ulam distance) with constant approximation, in sub-linear time. For two strings of l...