We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n, our goal is to find a set C of size k such t...
Mining discrete patterns in binary data is important for subsampling, compression, and clustering. We consider rankone binary matrix approximations that identify the dominant patt...
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
Abstract—We describe a novel application of using data mining and statistical learning methods to automatically monitor and detect abnormal execution traces from console logs in ...
Wei Xu, Ling Huang, Armando Fox, David Patterson, ...
The burgeoning amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective ...
Shenzhi Li, Christopher D. Janneck, Aditya P. Bela...