Sciweavers

ICDE
2003
IEEE

Distance Based Indexing for String Proximity Search

15 years 1 months ago
Distance Based Indexing for String Proximity Search
In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures are based on (a weighted) count of (i) character edit or (ii) block edit operations to transform one string into the other. Examples include the Levenshtein edit distance and the recently introduced compression distance. The main goal in this paper is to develop efficient near(est) neighbor search tools that work for both character and block edit distances. Our premise is that distance-based indexing methods, which are originally designed for metric distances can be modified for string distance measures, provided that they form almost metrics. We show that several distance measures, such as the c...
Jai Macker, Murat Tasan, Süleyman Cenk Sahina
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2003
Where ICDE
Authors Jai Macker, Murat Tasan, Süleyman Cenk Sahinalp, Z. Meral Özsoyoglu
Comments (0)