We propose a new method to group partials produced by each instrument of a polyphonic audio mixture. This method works for pitched and harmonic instruments and is specially adapted to singing voice. In our approach, we model time-varying frequencies of partials as a slowly varying frequency plus a sinusoidal modulation. The parameters obtained with this model plus some common Auditory Scene Analysis principles are used to define a similarity measure between partials. This multi-criterion based measure is then used to build the input similarity matrix of a clustering algorithm. Clusters obtained are groups of harmonically related partials. We evaluate the ability of our method to group partials per source when one of the sources is a singing voice. We show that partial clustering is a promising approach for singing voice detection and separation.