We observed that for multimedia data – especially music - collaborative similarity measures perform much better than similarity measures derived from content-based sound feature...
As the study of graphs, such as web and social graphs, becomes increasingly popular, the requirements of efficiency and programming flexibility of large graph processing tasks c...
We show that the complexity of the recently introduced medoid-shift algorithm in clustering N points is O(N2 ), with a small constant, if the underlying distance is Euclidean. This...
Farsite is a scalable, distributed file system that logically functions as a centralized file server but that is physically implemented on a set of client desktop computers. Farsi...
We examine the learning-curve sampling method, an approach for applying machinelearning algorithms to large data sets. The approach is based on the observation that the computatio...