Background: When DNA microarray data are used for gene clustering, genotype/phenotype correlation studies, or tissue classification the signal intensities are usually transformed ...
In this paper, we introduce and study the Minimum Consistent Subset Cover (MCSC) problem. Given a finite ground set X and a constraint t, find the minimum number of consistent sub...
Byron J. Gao, Martin Ester, Jin-yi Cai, Oliver Sch...
Abstract—MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) a...
Amol Ghoting, Rajasekar Krishnamurthy, Edwin P. D....
Large-scale cluster-based Internet services often host partitioned datasets to provide incremental scalability. The aggregation of results produced from multiple partitions is a f...
A pervasive problem in large relational databases is identity uncertainty which occurs when multiple entries in a database refer to the same underlying entity in the world. Relati...