Complex simulations can generate very large amounts of data stored disjointly across many local disks. Learning from this data can be problematic due to the difficulty of obtaining labels for the data. We present an algorithm for the application of semisupervised learning on disjoint data generated by complex simulations. Our semi-supervised technique shows a statistically significant accuracy improvement over supervised learning using the same underlying learning algorithm and requires less labeled data for comparable results.
John Nicholas Korecki, Kevin W. Bowyer, Larry O. H