Cloning in software systems is known to create problems during software maintenance. Several techniques have been proposed to detect the same or similar code fragments in software...
Feature selection is an important task in order to achieve better generalizability in high dimensional learning, and structure learning of Markov random fields (MRFs) can automat...
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has ...
Bayesian subspace analysis (BSA) has been successfully applied in data mining and pattern recognition. However, due to the use of probabilistic measure of similarity, it often need...
Automatic classification of documents is an important area of research with many applications in the fields of document searching, forensics and others. Methods to perform classif...