Sciweavers

SSDBM
2007
IEEE

Adaptive-Size Reservoir Sampling over Data Streams

14 years 6 months ago
Adaptive-Size Reservoir Sampling over Data Streams
Reservoir sampling is a well-known technique for sequential random sampling over data streams. Conventional reservoir sampling assumes a fixed-size reservoir. There are situations, however, in which it is necessary and/or advantageous to adaptively adjust the size of a reservoir in the middle of sampling due to changes in data characteristics and/or application behavior. This paper studies adaptivesize reservoir sampling over data streams considering two main factors: reservoir size and sample uniformity. First, the paper conducts a theoretical study on the effects of adjusting the size of a reservoir while sampling is in progress. The theoretical results show that such an adjustment may bring a negative impact on the probability of the sample being uniform (called uniformity confidence herein). Second, the paper presents a novel algorithm for maintaining the reservoir sample after the reservoir size is adjusted such that the resulting uniformity confidence exceeds a given threshol...
Mohammed Al-Kateb, Byung Suk Lee, Xiaoyang Sean Wa
Added 04 Jun 2010
Updated 04 Jun 2010
Type Conference
Year 2007
Where SSDBM
Authors Mohammed Al-Kateb, Byung Suk Lee, Xiaoyang Sean Wang
Comments (0)