This paper proposes a novel algorithm for minimizing the perceptual distortion in non-negative matrix factorization (NMF) based audio representation. We formulate the noise-to-mask ratio audio quality criterion in a form where it can be used in NMF and propose an algorithm for optimizing the criterion. We also propose a method for compensating the spreading of the representation error in the synthesis filterbank. The objective perceptual quality produced by the proposed method is found to outperform all the reference methods. We also study the trade-off between the window length and the rank of factorization with a fixed data rate, and find that the best performance is obtained with window lengths between 10 and 30 ms.