We consider the problem of word boundary detection in spontaneous speech utterances. Acoustic features have been well explored in the literature in the context of word boundary detection; however, in spontaneous speech of Switchboard-I corpus, we found that the accuracy of word boundary detection using acoustic features is poor (F-score 0.63). We propose a new feature - that captures lexical cues in the context of the word boundary detection problem. We show that including proposed lexical feature along with the usual acoustic features, the accuracy of the word boundary detection improves considerably (F-score 0.81). We also demonstrate the robustness of our proposed feature in presence of different noise levels for additive white and pink noise.
Andreas Tsiartas, Prasanta K. Ghosh, Panayiotis G.