It has been shown that speech spectrograms can be read by trained experts. In this work, we regard the speech spectrogram image as a written text in some unknown language and perform segmentation in order to capture the energy associated with each formant. We propose an algorithm based on Mathematical Morphology operators and mainly on the watershed transform. The result is robust segmentation for wideband speech spectrograms that can be later used for automatic speech recognition. We show results of experimental runs for different phoneme classes.
Raphael Steinberg, Douglas D. O'Shaughnessy