Performance Analysis of Distributed Architectures to Index One Terabyte of Text

14 years 2 months ago

Download research.yahoo.com

We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture using a variable number of workstations. A collection of approximately 94 million documents and 1 terabyte of text is used to test the performance of the different architectures. We show that in a purely distributed architecture, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a large number of query servers is used, mainly due to the reduction of the network load.

Fidel Cacheda, Vassilis Plachouras, Iadh Ounis

Real-time Traffic

ECIR 2004 | Information Retrieval System | Information Technology | Large Web Collection | Query Servers |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2004
Where	ECIR
Authors	Fidel Cacheda, Vassilis Plachouras, Iadh Ounis

Comments (0)

Sciweavers

Performance Analysis of Distributed Architectures to Index One Terabyte of Text

ECIR 2004 | Information Retrieval System | Information Technology | Large Web Collection | Query Servers |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers