Using synthetic data safely in classification

14 years 7 days ago

Download www.lehigh.edu

When is it safe to use synthetic data in supervised classification? Trainable classifier technologies require large representative training sets consisting of samples labeled with their true class. Acquiring such training sets is difficult and costly. One way to alleviate this problem is to enlarge training sets by generating artificial, synthetic samples. Of course this immediately raises many questions, perhaps the first being "Why should we trust artificially generated data to be an accurate representative of the real distributions?" Other questions include "When will training on synthetic data work as well as - or better than training on real data ?". We distinguish between sample space (the set of real samples), parameter space (all samples that can be generated synthetically), and finally, feature space (the set of samples in terms of finite numerical values). In this paper, we discuss a series of experiments, in which we produced synthetic data in parameter ...

Jean Nonnemaker, Henry Baird

Real-time Traffic

Document Analysis | DRR 2009 | Real Data | Synthetic Data | Training |

claim paper

Post Info
More Details (n/a)

Added	17 Feb 2011
Updated	17 Feb 2011
Type	Journal
Year	2009
Where	DRR
Authors	Jean Nonnemaker, Henry Baird

Comments (0)

Sciweavers

Using synthetic data safely in classification

Document Analysis | DRR 2009 | Real Data | Synthetic Data | Training |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers