Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of minin...
Understanding and interpreting a large data source is an important but challenging operation in many technical disciplines. Computer visualization has become a valuable tool to he...
Abstract. Many data mining approaches focus on the discovery of similar (and frequent) data values in large data sets. We present an alternative, but complementary approach in whic...
Jeff Edmonds, Jarek Gryz, Dongming Liang, Ren&eacu...
Set similarity join has played an important role in many real-world applications such as data cleaning, near duplication detection, data integration, and so on. In these applicati...
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more ...