Mining and Predicting Duplication over Peer-to-Peer Query Streams

16 years 1 months ago

Download apex.sjtu.edu.cn

Many previous works of data mining user queries in Peer-to-Peer systems focused their attention on the distribution of query contents. However, few has been done towards a better understanding of the time series distribution of these queries, which is vital for system performance. To remedy this situation, this paper mines query steams by using automatic time series analysis to evaluate different linear models(Box-Jenkins models and some simple windowed-mean models) for predicting the number of duplicated queries from 10 minutes to 2 hours into the future. Both the predictive power and the computational costs of these models are evaluated over 318,942,450 real world Gnutella queries collected over 3 months. We ﬁnd the number of duplicated queries is consistently predictable. Simple, practical models like AR perform well on prediction.

Shicong Meng, Yifeng Shao, Cong Shi, Dingyi Han, Y

Real-time Traffic

Data Mining | Duplicated Queries | ICDM 2006 | Mining User Queries | Time Series |

claim paper

Added	11 Jun 2010
Updated	11 Jun 2010
Type	Conference
Year	2006
Where	ICDM
Authors	Shicong Meng, Yifeng Shao, Cong Shi, Dingyi Han, Yong Yu

Sciweavers

Mining and Predicting Duplication over Peer-to-Peer Query Streams

Data Mining | Duplicated Queries | ICDM 2006 | Mining User Queries | Time Series |

Explore & Download

Productivity Tools

Sciweavers