Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In thi...
Wigderson and Xiao presented an efficient derandomization of the matrix Chernoff bound using the method of pessimistic estimators [WX08]. Building on their construction, we prese...
We present and evaluate GPU Bucket Sort, a parallel deterministic sample sort algorithm for many-core GPUs. Our method is considerably faster than Thrust Merge (Satish et.al., Pro...
We analyze the convergence of randomized trace estimators. Starting at 1989, several algorithms have been proposed for estimating the trace of a matrix by 1 M M i=1 zT i Azi, where...
Given a matrix A, it is often desirable to find a good approximation to A that has low rank. We introduce a simple technique for accelerating the computation of such approximation...