Top-k Ranked Document Search in General Text Databases

15 years 7 months ago

Download www.dcc.uchile.cl

Text search engines return a set of k documents ranked by similarity to a query. Typically, documents and queries are drawn from natural language text, which can readily be partitioned into words, allowing optimizations of data structures and algorithms for ranking. However, in many new search domains (DNA, multimedia, OCR texts, Far East languages) there is often no obvious definition of words and traditional indexing approaches are not so easily adapted, or break down entirely. We present two new algorithms for ranking documents against a query without making any assumptions on the structure of the underlying text. We build on existing theoretical techniques, which we have implemented and compared empirically with new approaches introduced in this paper. Our best approach is significantly faster than existing methods in RAM, and is even three times faster than a state-of-the-art inverted file implementation for English text when word queries are issued.

J. Shane Culpepper, Gonzalo Navarro, Simon J. Pugl

Real-time Traffic

Algorithms | Documents | ESA 2010 | Natural Language Text | Traditional Indexing Approaches |

claim paper

» Semantic link based topK join queries in P2P networks

» FleXPath Flexible Structure and FullText Querying for XML

» The TopX DBampIR engine

» Keyword search across databases and documents

» Presenting the results of relevanceoriented search over XML documents

» TopCells Keywordbased search of topk aggregated documents in text cube

» Term Ranking for Clustering Web Search Results

» Hybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents

Post Info
More Details (n/a)

Added	09 Nov 2010
Updated	09 Nov 2010
Type	Conference
Year	2010
Where	ESA
Authors	J. Shane Culpepper, Gonzalo Navarro, Simon J. Puglisi, Andrew Turpin

Comments (0)

Sciweavers

Top-k Ranked Document Search in General Text Databases

Algorithms | Documents | ESA 2010 | Natural Language Text | Traditional Indexing Approaches |

Explore & Download

Productivity Tools

Sciweavers