A Bayes-true data generator for evaluation of supervised and unsupervised learning methods

13 years 3 months ago

Download www.dfki.uni-kl.de

Benchmarking pattern recognition, machine learning and data mining methods commonly relies on real-world data sets. However, there are some disadvantages in using real-world data. On one hand collecting real-world data can become diﬃcult or impossible for various reasons, on the other hand real-world variables are hard to control, even in the problem domain; in the feature domain, where most statistical learning methods operate, exercising control is even more diﬃcult and hence rarely attempted. This is at odds with the scientiﬁc experimentation guidelines mandating the use of as directly controllable and as directly observable variables as possible. Because of this, synthetic data possesses certain advantages over real-world data sets. In this paper we propose a method that produces synthetic data with guaranteed global and class-speciﬁc statistical properties. This method is based on overlapping class densities placed on the corners of a regular k-simplex. This generator can...

Janick V. Frasch, Aleksander Lodwich, Faisal Shafa

Real-time Traffic

Observable Variables | Pattern Recognition | PRL 2011 | Problem Domain | Synthetic Data |

claim paper

Post Info
More Details (n/a)

Added	17 Sep 2011
Updated	17 Sep 2011
Type	Journal
Year	2011
Where	PRL
Authors	Janick V. Frasch, Aleksander Lodwich, Faisal Shafait, Thomas M. Breuel

Comments (0)

Sciweavers

A Bayes-true data generator for evaluation of supervised and unsupervised learning methods

Observable Variables | Pattern Recognition | PRL 2011 | Problem Domain | Synthetic Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers