The problem of eciently retrieving and ranking documents from a huge collection according to their relevance to a research topic is addressed. A broad class of queries is dened and, based on previous work, a parallel system architecture capable of handling them is proposed. The time cost of the steps involved in query processing is analysed and the space requirements of the data structures used are outlined. The result is a model, characterised by parameters which can be derived from machine conguration information and some simple empirical measurements, from which collection capacities and likely query processing rates may be determined for given hardware congurations. The performance of a prototype implementation for a 128 node machine is analysed in terms of the model and conclusions are drawn on the relative importance of I/O and CPU parallelism.