This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are...
The multidimensional, heterogeneous, and temporal nature of speech databases raises interesting challenges for representation and query. Recently, annotation graphs have been prop...
We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a tools for manipulating these annotations. The ...
Steven Bird, David Day, John S. Garofolo, John Hen...
This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing...
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polys...
Finite-state morphology in the general tradition of the Two-Level and Xerox implementations has proved very successful in the production of robust morphological analyzer-generator...
This paper presents a semantic parsing approach for non domain-specific texts. Semantic parsing is one of the major bottlenecks of Natural Language Understanding (NLU) systems and...
Previous work (Frank and Satta, 1998; Karttunen, 1998) has shown that Optimality Theory with gradient constraints generally is not finite state. A new finite-state treatment of gr...
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segm...
Elizabeth Shriberg, Andreas Stolcke, Dilek Z. Hakk...