Sciweavers

PODS
2015
ACM

Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms

8 years 8 months ago
Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms
Histograms are among the most popular structures for the succinct summarization of data in a variety of database applications. In this work, we provide fast and nearoptimal algorithms for approximating arbitrary one dimensional data distributions by histograms. A k-histogram is a piecewise constant function with k pieces. We consider the following natural problem, previously studied by Indyk, Levi, and Rubinfeld [ILR12] in PODS 2012: Given samples from a distribution p over {1, . . . , n}, compute a k-histogram that minimizes the 2-distance from p, up to an additive ε. We design an algorithm for this problem that uses the information– theoretically minimal sample size of m = O(1/ε2 ), runs in sample–linear time O(m), and outputs an O(k)– histogram whose 2-distance from p is at most O(optk)+ , where optk is the minimum 2-distance between p and any k-histogram. Perhaps surprisingly, the sample size and running time of our algorithm are independent of the universe size n. We gene...
Jayadev Acharya, Ilias Diakonikolas, Chinmay Hegde
Added 16 Apr 2016
Updated 16 Apr 2016
Type Journal
Year 2015
Where PODS
Authors Jayadev Acharya, Ilias Diakonikolas, Chinmay Hegde, Jerry Zheng Li, Ludwig Schmidt
Comments (0)