Indexing Uncertain Categorical Data

15 years 5 months ago

Download orion.cs.purdue.edu

Uncertainty in categorical data is commonplace in many applications, including data cleaning, database integration, and biological annotation. In such domains, the correct value of an attribute is often unknown, but may be selected from a reasonable number of alternatives. Current database management systems do not provide a convenient means for representing or manipulating this type of uncertainty. In this paper we extend traditional systems to explicitly handle uncertainty in data values. We propose two index structures for efficiently searching uncertain categorical data, one based on the R-tree and another based on an inverted index structure. Using these structures, we provide a detailed description of the probabilistic equality queries they support. Experimental results using real and synthetic datasets demonstrate how these index structures can effectively improve the performance of queries through the use of internal probabilistic information.

Sarvjeet Singh, Chris Mayfield, Sunil Prabhakar, R

Real-time Traffic

Database | ICDE 2007 | Index Structures | Probabilistic Equality Queries | Uncertain Categorical Data |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2009
Updated	01 Nov 2009
Type	Conference
Year	2007
Where	ICDE
Authors	Sarvjeet Singh, Chris Mayfield, Sunil Prabhakar, Rahul Shah, Susanne E. Hambrusch

Comments (0)

Sciweavers

Indexing Uncertain Categorical Data

Database | ICDE 2007 | Index Structures | Probabilistic Equality Queries | Uncertain Categorical Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers