Data-driven Spoken Language Understanding (SLU) systems need semantically annotated data which are expensive, time consuming and prone to human errors. Active learning has been successfully applied to automatic speech recognition and utterance classification. In general, corpora annotation for SLU involves such tasks as sentence segmentation, chunking or frame labeling and predicate-argument annotation. In such cases human annotations are subject to errors increasing with the annotation complexity. We investigate two alternative noise-robust active learning strategies that are either data-intensive or supervision-intensive. The strategies detect likely erroneous examples and improve significantly the SLU performance for a given labeling cost. We apply uncertainty based active learning with conditional random fields on the concept segmentation task for SLU. We perform annotation experiments on two databases, namely ATIS (English) and Media (French). We show that our noise-robust alg...
Christian Raymond, G. Riccardfi