Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for finding near-duplicate web pag...
Abstract. Noise significantly affects cluster quality. Conventional clustering methods hardly detect clusters in a data set containing a large amount of noise. Projected clusterin...
Jiuyong Li, Xiaodi Huang, Clinton Selke, Jianming ...
Abstract. Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have ...
We present a novel randomized approach to the graph isomorphism problem. Our algorithm aims at solving difficult instances by producing randomized certificates for non-isomorphis...
We present the first randomized polynomial-time simplex algorithm for linear programming. Like the other known polynomial-time algorithms for linear programming, its running time ...