We present a new parallel algorithm to compute an exact triangularization of large square or rectangular and dense or sparse matrices in any field. Using fast matrix multiplication, our algorithm has the best known sequential arithmetic complexity. Furthermore, on distributed architectures, it drastically reduces the total volume of communication compared to previously known algorithms. The resulting matrix can be used to compute the rank or to solve a linear system. Over finite fields, for instance, our method has proven useful in the computation of large Gr