Parallel Generation of Inverted Files for Distributed Text Collections

15 years 11 months ago

Download www.dcc.uchile.cl

We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth network of workstations with a shared-nothing memory organization. The text collection is assumed to be evenly distributed among the disks of the various workstations. Compression is used to save space in main memory (where inverted lists are kept) and to save time when data have to be moved across the network. The algorithm average running cost is O(t=p) where t is the size of the whole text collection and p is the number of available processors. We implemented our algorithm and drew experimental results. In a 100 Mbits/s switched Ethernet network with 4 PentiumPro 200 megahertz, 128 megabytes RAM on each processor, we were able to invert 2 gigabytes of TREC documents in 15 minutes. Further, we also proposed an analytical model for the algorithm execution time.

Berthier A. Ribeiro-Neto, Joao Paulo Kitajima, Gon

Real-time Traffic

Algorithm | Scalable Algorithm | SCCC 1998 | Text Collection | Theoretical Computer Science |

claim paper

» Distributed Query Processing Using Partitioned Inverted Files

» QVI Querybased virtual index for distributed information retrieval

» Efficient Metadata Generation to Enable Interactive Data Discovery over LargeScale Scienti...

» A Search Engine Accepting OnLine Updates

» Managing Large Scale Data for Earthquake Simulations

» An Efficient MPIIO for Noncontiguous Data Access over InfiniBand

» Performance Analysis of a Distributed QuestionAnswering System

» A Scalable Indexing Mechanism for OntologyBased Information Integration

Post Info
More Details (n/a)

Added	05 Aug 2010
Updated	05 Aug 2010
Type	Conference
Year	1998
Where	SCCC
Authors	Berthier A. Ribeiro-Neto, Joao Paulo Kitajima, Gonzalo Navarro, Cláudio R. G. Sant'Ana, Nivio Ziviani

Comments (0)

Sciweavers

Parallel Generation of Inverted Files for Distributed Text Collections

Algorithm | Scalable Algorithm | SCCC 1998 | Text Collection | Theoretical Computer Science |

Explore & Download

Productivity Tools

Sciweavers