We report the results of a study on topic spotting in conversational speech. Using a machine learning approach, we build classifiers that accept an audio file of conversational human speech as input, and output an estimate of the topic being discussed. Our methodology makes use of a wellknown corpus of transcribed and topic-labeled speech (the Switchboard corpus), and involves an interesting double use of the BOOSTEXTER learning algorithm. Our work is distinguished from previous efforts in topic spotting by our explicit study of the effects of dialogue length on classifier performance, and by our use of off-theshelf speech recognition technology. One of our main results is the identification of a single classifier with good performance (relative to our classifier space) across all subdialogue lengths.
Kary Myers, Michael J. Kearns, Satinder P. Singh,