Indexing Shared Content in Information Retrieval Systems

16 years 7 months ago

Download fontoura.org

Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separately, causing shared content to be indexed multiple times. In this paper, we describe a new document representation model where related documents are organized as a tree, allowing shared content to be indexed just once. We show how this representation model can be encoded in an inverted index and we describe algorithms for evaluating free-text queries based on this encoding. We also show how our representation model applies to web, email, and newsgroup search. Finally, we present experimental results showing that our methods can provide a significant reduction in the size of an inverted index as well as in the time to build and query it.

Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi

Real-time Traffic

Database | Document Representation Model | EDBT 2006 | Inverted Index | Modern Document Collections |

claim paper

» ContentBased Retrieval of Web Pages and Other Hierarchical Objects with Selforganizing Map...

» Generation of multimedia TV news contents for WWW

» ContentBased Image Retrieval in Astronomy

» iCLEF 2006 Overview Searching the Flickr WWW PhotoSharing Repository

» An Environment to Test Progressive Refinement of Indexing for ContentBased Image Retrieval

» A scalable service for photo annotation sharing and search

» Mobile media metadata metadata creation system for mobile images

» ContentBased Indexing and Retrieval of Audio Data using Wavelets

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2006
Where	EDBT
Authors	Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Michael Herscovici, Ronny Lempel, John McPherson, Runping Qi, Eugene J. Shekita

Comments (0)

Sciweavers

Indexing Shared Content in Information Retrieval Systems

Database | Document Representation Model | EDBT 2006 | Inverted Index | Modern Document Collections |

Explore & Download

Productivity Tools

Sciweavers