Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio