Separation of voice and music is an interesting but difficult problem. It is useful for many other researches such as audio content analysis. In this paper, the difference between voice and music signals is carefully studied. It is proposed that the Harmonic Structure Stability is the key difference between them. A separation algorithm based on this theory is proposed. The main idea is to learn the average harmonic structure of the music, and then separate signals by using it to distinguish voice and music harmonic structures. Experimental results show that the algorithm can separate mixed signals and obtains not only a very high Signal-to-Noise Ratio (SNR) but also a rather good subjective audio quality.