Deterministic algorithms for sampling count data

14 years 4 months ago

Download homes.ieu.edu.tr

Processing and extracting meaningful knowledge from count data is an important problem in data mining. The volume of data is increasing dramatically as the data is generated by day-to-day activities such as market basket data, web clickstream data or network data. Most mining and analysis algorithms require multiple passes over the data, which requires extreme amounts of time. One solution to save time would be to use samples, since sampling is a good surrogate for the data and the same sample can be used to answer many kinds of queries. In this paper, we propose two deterministic sampling algorithms, Biased-L2 and DRS. Both produce samples vastly superior to the previous deterministic and random algorithms, both in sample quality and accuracy. Our algorithms also improve on the run-time and memory footprint of the existing deterministic algorithms. The new algorithms can be used to sample from a relational database as well as data streams, with the ability to examine each transaction...

Hüseyin Akcan, Alex Astashyn, Hervé Br

Real-time Traffic

Algorithms | Clickstream Data | Deterministic Sampling Algorithms | DKE 2008 |

claim paper

Post Info
More Details (n/a)

Added	10 Dec 2010
Updated	10 Dec 2010
Type	Journal
Year	2008
Where	DKE
Authors	Hüseyin Akcan, Alex Astashyn, Hervé Brönnimann

Comments (0)

Sciweavers

Deterministic algorithms for sampling count data

Algorithms | Clickstream Data | Deterministic Sampling Algorithms | DKE 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers