Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
The biological sciences are undergoing an explosion in the amount of available data. New data analysis methods are needed to deal with the data. We present work using KDD to analys...
This paper shows that the performance of a binary classifier can be significantly improved by the processing of structured unlabeled data, i.e. data are structured if knowing the ...
Past empirical work has shown that learning multiple related tasks from data simultaneously can be advantageous in terms of predictive performance relative to learning these tasks...
This paper promotes the use of supervised machine learning in laboratory settings where chemists have a large number of samples to test for some property, and are interested in id...