Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms

8 years 8 months ago

Download www.mit.edu

Histograms are among the most popular structures for the succinct summarization of data in a variety of database applications. In this work, we provide fast and nearoptimal algorithms for approximating arbitrary one dimensional data distributions by histograms. A k-histogram is a piecewise constant function with k pieces. We consider the following natural problem, previously studied by Indyk, Levi, and Rubinfeld [ILR12] in PODS 2012: Given samples from a distribution p over {1, . . . , n}, compute a k-histogram that minimizes the 2-distance from p, up to an additive ε. We design an algorithm for this problem that uses the information– theoretically minimal sample size of m = O(1/ε2 ), runs in sample–linear time O(m), and outputs an O(k)– histogram whose 2-distance from p is at most O(optk)+ , where optk is the minimum 2-distance between p and any k-histogram. Perhaps surprisingly, the sample size and running time of our algorithm are independent of the universe size n. We gene...

Jayadev Acharya, Ilias Diakonikolas, Chinmay Hegde

Real-time Traffic

Database | PODS 2015 |

claim paper

Post Info
More Details (n/a)

Added	16 Apr 2016
Updated	16 Apr 2016
Type	Journal
Year	2015
Where	PODS
Authors	Jayadev Acharya, Ilias Diakonikolas, Chinmay Hegde, Jerry Zheng Li, Ludwig Schmidt

Comments (0)

Sciweavers

Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms

Database | PODS 2015 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers