This paper presents a neural network approach to the problem of nding the dialogue act for a given utterance. So far only symbolic, decision tree and statistical approaches were utilized to deal with a corpus as large as the VERBMOBIL corpus. We propose solutions to the questions of representing speech, network architecture and training in this context. We argue that, when using neural networks, a task like this can only be solved in a modular approach where training data is split and processed by different components of a larger network. Special care must be taken in constructing a feeding mechanism that avoids oscillatory behaviour due to the heterogeneous data. We were successful in constructing a modular neural network that yielded interesting time-sensitive properties as well as recognition rates superior to most other methods. A rst attempt at devising a hybrid system got very close to the best results of this eld which suggests further enhancement in future architectures.
M. Kipp