Finding global icebergs over distributed data sets

15 years 26 days ago

Download www.cc.gatech.edu

Finding icebergs ? items whose frequency of occurrence is above a certain threshold ? is an important problem with a wide range of applications. Most of the existing work focuses on iceberg queries at a single node. However, in many real-life applications, data sets are distributed across a large number of nodes. Two na?ive approaches might be considered. In the first, each node ships its entire data set to a central server, and the central server uses single-node algorithms to find icebergs. But it may incur prohibitive communication overhead. In the second, each node submits local icebergs, and the central server combines local icebergs to find global icebergs. But it may fail because in many important applications, globally frequent items may not be frequent at any node. In this work, we propose two novel schemes that provide accurate and efficient solutions to this problem: a sampling-based scheme and a counting-sketch-based scheme. In particular, the latter scheme incurs a commun...

Qi Zhao, Mitsunori Ogihara, Haixun Wang, Jun Xu

Real-time Traffic

Central Server | Combines Local Icebergs | Database | Global Icebergs | PODS 2006 |

claim paper

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2006
Where	PODS
Authors	Qi Zhao, Mitsunori Ogihara, Haixun Wang, Jun Xu

Comments (0)

Sciweavers

Finding global icebergs over distributed data sets

Central Server | Combines Local Icebergs | Database | Global Icebergs | PODS 2006 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers