The multidimensional, heterogeneous, and temporal nature of speech databases raises interesting challenges for representation and query. Recently, annotation graphs have been prop...
We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a tools for manipulating these annotations. The ...
Steven Bird, David Day, John S. Garofolo, John Hen...
This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing...
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polys...
Finite-state morphology in the general tradition of the Two-Level and Xerox implementations has proved very successful in the production of robust morphological analyzer-generator...
This paper presents a semantic parsing approach for non domain-specific texts. Semantic parsing is one of the major bottlenecks of Natural Language Understanding (NLU) systems and...
Previous work (Frank and Satta, 1998; Karttunen, 1998) has shown that Optimality Theory with gradient constraints generally is not finite state. A new finite-state treatment of gr...
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segm...
Elizabeth Shriberg, Andreas Stolcke, Dilek Z. Hakk...
It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail ("spam"). We conduct a thorough evaluation of this proposal on...
Ion Androutsopoulos, John Koutsias, Konstantinos C...