Peer-to-peer file sharing networks have emerged as a new popular application in the Internet scenario. In this paper, we provide an analytical model of the resources size and of the contents shared at a given node. We also study the composition of the content workload hosted in the Gnutella network over time. Finally, we investigate the negative impact of oversimplified hypotheses (e.g., the use of filenames as resource identifiers) on the potentially achievable hit rate of a file sharing cache. The message coming out of our findings is clear: file sharing traffic can be reduced by using a cache to minimize download time and network usage. The design and tuning of the cache server should take into account the presence of different resources sharing the same name and should consider push-based downloads. Failing to do so can result in reduced effectiveness of the caching mechanism.
Mauro Andreolini, Riccardo Lancellotti, Philip S.