Sciweavers

ICDE
2005
IEEE

Finding (Recently) Frequent Items in Distributed Data Streams

15 years 24 days ago
Finding (Recently) Frequent Items in Distributed Data Streams
We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Na?ive methods of combining approximate frequency counts from multiple nodes tend to result in excessively large data structures that are costly to transfer among nodes. To minimize communication requirements, the degree of precision maintained by each node while counting item frequencies must be managed carefully. We introduce the concept of a precision gradient for managing precision when nodes are arranged in a hierarchical communication structure. We then study the optimization problem of how to set the precision gradient so as to minimize communication, and provide optimal solutions that minimize worst-case communication load over all possible inputs. We then introduce a variant designed to perform well in practice, with input data that does not conform to worst-case characteristics. We verify the effectiveness of our approach empirically using...
Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2005
Where ICDE
Authors Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston
Comments (0)