Space-Efficient Framework for Top-k String Retrieval Problems

14 years 4 months ago

Download www.ittc.ku.edu

Given a set D = {d1, d2, ..., dD} of D strings of total length n, our task is to report the "most relevant" strings for a given query pattern P. This involves somewhat more advanced query functionality than the usual pattern matching, as some notion of "most relevant" is involved. In information retrieval literature, this task is best achieved by using inverted indexes. However, inverted indexes work only for some predefined set of patterns. In the pattern matching community, the most popular pattern-matching data structures are suffix trees and suffix arrays. However, a typical suffix tree search involves going through all the occurrences of the pattern over the entire string collection, which might be a lot more than the required relevant documents. The first formal framework to study such kind of retrieval problems was given by Muthukrishnan [25]. He considered two metrics for relevance: frequency and proximity. He took a thresholdbased approach on these metrics ...

Wing-Kai Hon, Rahul Shah, Jeffrey Scott Vitter

Real-time Traffic

Data Structures | FOCS 2009 | Pattern Matching | Suffix Tree | Theoretical Computer Science |

claim paper

Post Info
More Details (n/a)

Added	16 Aug 2010
Updated	16 Aug 2010
Type	Conference
Year	2009
Where	FOCS
Authors	Wing-Kai Hon, Rahul Shah, Jeffrey Scott Vitter

Comments (0)

Sciweavers

Space-Efficient Framework for Top-k String Retrieval Problems

Data Structures | FOCS 2009 | Pattern Matching | Suffix Tree | Theoretical Computer Science |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers