We propose a system which extracts the melody line played by a solo instrument from complex audio. At every time frame multiple fundamental frequency (F0) hypotheses are generated, and later processing uses various knowledge sources to choose the most likely succession of F0s. Knowledge sources include an instrument recognition module and temporal knowledge about tone durations and interval transitions, which are integrated in a probabilistic search. The proposed system improved the number of frames with correct F0 estimates by 14% compared to a baseline system which simply uses the strongest F0 at every point in time. The number of spurious tones was reduced to nearly a third compared to the baseline system, resulting in significantly smoother melody lines.
Jana Eggink, Guy J. Brown