We describe a corpus-based approach to natural language generation (NLG). The approach has been implemented as a component of a spoken dialog system and a series of evaluations were carried out. Our system uses n-gram language models, which have been found useful in other language technology applications, in a generative mode. It is not yet clear whether the simple n-grams can adequately model human language generation in general, but we show that we can successfully apply this ubiquitous modeling technique to the task of natural language generation for spoken dialog systems. In this paper, we discuss applying corpus-based stochastic language generation at two levels: content selection and sentence planning/realization. At the content selection level, output utterances are modeled by bigrams, and the appropriate attributes are chosen using bigram statistics. In sentence planning and realization, corpus utterances are modeled by n-grams of varying length, and new utterances are generat...
Alice Oh, Alexander I. Rudnicky