Similarity search and similarity join on strings are important for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences....
B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally ? when keys are long or of variable length, ? when keys are...
Michael A. Bender, Martin Farach-Colton, Bradley C...
Today's record matching infrastructure does not allow a flexible way to account for synonyms such as "Robert" and "Bob" which refer to the same name, and ...
This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact dir...
Abstract. Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate query data structures for such 3-D structure...