Given a set of monophonic, harmonic sound sources (e.g. human voices or wind instruments), multi-pitch estimation (MPE) is the task of determining the instantaneous pitches of each source. Multi-pitch tracking (MPT) connects the instantaneous pitch estimates provided by MPE algorithms into pitch trajectories of sources. A trajectory can be short (within a musical note), or long (an entire piece of music). While note-level MPT methods usually utilize local timefrequency proximity of pitches to connect them into a note, songlevel MPT is much more difficult and needs more information. This is because pitches evolve discontinuously from note to note, and pitch trajectories can even interweave. In this paper, we cast the song-level MPT problem as a constrained clustering problem. The constraints are time-frequency locality of pitches and the clustering objective is their timbre consistency. Due to this problem’s unique properties, existing constrained clustering algorithms cannot be dir...