In this paper, we propose SPRITE (Selective PRogressive Index Tuning by Examples), a scalable system for text retrieval in a structured P2P network. Under SPRITE, each peer is responsible for a certain number of terms. However, for each document, SPRITE learns from (past) queries to select only a small set of representative terms for indexing; and these terms are progressively refined with subsequent queries. We implemented the proposed strategy, and compare its retrieval effectiveness in terms of both precision and recall against a static scheme (without learning) and a centralized system (ideal). Our experimental results show that SPRITE is nearly as effective as the centralized system, and considerably outperforms the static scheme.
Yingguang Li, H. V. Jagadish, Kian-Lee Tan