The dataset generated by a large-scale numerical simulation may include thousands of timesteps and hundreds of variables describing different aspects of the modeled physical pheno...
Subspace clustering and frequent itemset mining via “stepby-step” algorithms that search the subspace/pattern lattice in a top-down or bottom-up fashion do not scale to large ...
We examine the set covering machine when it uses data-dependent half-spaces for its set of features and bound its generalization error in terms of the number of training errors an...
Mario Marchand, Mohak Shah, John Shawe-Taylor, Mar...
Random sampling is one of the most fundamental data management tools available. However, most current research involving sampling considers the problem of how to use a sample, and...
An important class of queries is the LIKE predicate in SQL. In the absence of an index, LIKE queries are subject to performance degradation. The notion of indexing on substrings (...