Fast Indexes and Algorithms for Set Similarity Selection Queries

16 years 8 months ago

Download www.research.att.com

Data collections often have inconsistencies that arise due to a variety of reasons, and it is desirable to be able to identify and resolve them efficiently. Set similarity queries are commonly used in data cleaning for matching similar data. In this work we concentrate on set similarity selection queries: Given a query set, retrieve all sets in a collection with similarity greater than some threshold. Various set similarity measures have been proposed in the past for data cleaning purposes. In this work we concentrate on weighted similarity functions like TF/IDF, and introduce variants that are well suited for set similarity selections in a relational database context. These variants have special semantic properties that can be exploited to design very efficient index structures and algorithms for answering queries efficiently. We present modifications of existing technologies to work for set similarity selection queries. We also introduce three novel algorithms based on the Threshold ...

Marios Hadjieleftheriou, Amit Chandel, Nick Koudas

Real-time Traffic

Database | ICDE 2008 | Set Similarity Measures | Set Similarity Selection | Similarity Selection Queries |

claim paper

» Distributed Sparse Spatial Selection Indexes

» Layout indexing of trademark images

» Hashed samples selectivity estimators for set similarity selection queries

» A Dynamic Pivot Selection Technique for Similarity Search

» LSH forest selftuning indexes for similarity search

» Hierarchical Bitmap Index An Efficient and Scalable Indexing Technique for SetValued Attri...

» An Indexing Scheme for Fast Similarity Search in Large Time Series Databases

» MLRIndex An Index Structure for Fast and Scalable Similarity Search in High Dimensions

Post Info
More Details (n/a)

Added	01 Nov 2009
Updated	01 Nov 2009
Type	Conference
Year	2008
Where	ICDE
Authors	Marios Hadjieleftheriou, Amit Chandel, Nick Koudas, Divesh Srivastava

Comments (0)

Sciweavers

Fast Indexes and Algorithms for Set Similarity Selection Queries

Database | ICDE 2008 | Set Similarity Measures | Set Similarity Selection | Similarity Selection Queries |

Explore & Download

Productivity Tools

Sciweavers