This paper proposes a way of modelling the time-varying spectral energy distribution of musical instrument sounds. The model consists of an excitation signal, a body response filter, and a loss filter which implements a frequency-dependent decay. The three parts are further represented with a linear model which allows controlling the number of parameters involved. A method is proposed for estimating all the model parameters jointly, taking into account additive noise. The method is evaluated by measuring its accuracy in representing 33 musical instruments and by testing its usefulness in extracting the melodic line of one instrument from a polyphonic audio signal.