We present a novel, maximum likelihood framework for automatic spike-sorting based on a joint statistical model of action potential waveform shape and inter-spike interval durations of cortical neuronal firing clusters. We derive an expression for the joint likelihood of the set of observed waveforms and neuronal firing times and hidden neuronal labels. We then use an iterative unsupervised procedure for simultaneous clustering and parameter estimation to find the maximum-likelihood sequence of neuronal labels. We evaluate our method on the WaveClus artificial data-set with 2483 firing events, and obtain a significant improvement in clustering accuracy over the waveform-only EM-GMM baseline in high noise conditions.