Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A = CC+A, where C+ is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each a...
Petros Drineas, Michael W. Mahoney, S. Muthukrishn