Distance Based Indexing for String Proximity Search

15 years 2 months ago

Download compbio.cs.sfu.ca

In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures are based on (a weighted) count of (i) character edit or (ii) block edit operations to transform one string into the other. Examples include the Levenshtein edit distance and the recently introduced compression distance. The main goal in this paper is to develop efficient near(est) neighbor search tools that work for both character and block edit distances. Our premise is that distance-based indexing methods, which are originally designed for metric distances can be modified for string distance measures, provided that they form almost metrics. We show that several distance measures, such as the c...

Jai Macker, Murat Tasan, Süleyman Cenk Sahina

Real-time Traffic

Database | ICDE 2003 | Levenshtein Edit Distance | Popular String Distance | String Distance Measures |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2009
Updated	01 Nov 2009
Type	Conference
Year	2003
Where	ICDE
Authors	Jai Macker, Murat Tasan, Süleyman Cenk Sahinalp, Z. Meral Özsoyoglu

Comments (0)

Sciweavers

Distance Based Indexing for String Proximity Search

Database | ICDE 2003 | Levenshtein Edit Distance | Popular String Distance | String Distance Measures |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers