Homograph ambiguity is an original issue in Text-to-Speech (TTS). To disambiguate homograph, several efficient approaches have been proposed such as part-of-speech (POS) n-gram, B...
Statistical measures of word similarity have application in many areas of natural language processing, such as language modeling and information retrieval. We report a comparative...
We report on our work to automatically build a corpus of instructional text annotated with lexical semantics information. We have coupled the parser LCFLEX with a lexicon and onto...
We report here empirical results of a series of studies aimed at automatically predicting information quality in news documents. Multiple research methods and data analysis techni...
Rong Tang, Kwong Bor Ng, Tomek Strzalkowski, Paul ...
This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-tra...
Mark Steedman, Rebecca Hwa, Stephen Clark, Miles O...
We introduce two probabilistic models that can be used to identify elementary discourse units and build sentence-level discourse parse trees. The models use syntactic and lexical ...
Leximancer is a software system for performing conceptual analysis of text data in a largely language independent manner. The system is modelled on Content Analysis and provides u...
Automatic restoration of punctuation from unpunctuated text has application in improving the fluency and applicability of speech recognition systems. We explore the possibility t...
Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling...