Previous work on statistical language generation has primarily focused on grammaticality and naturalness, scoring generation possibilities according to a language model or user feedback. More recent work has investigated data-driven techniques for controlling linguistic style without overgeneration, by reproducing variation dimensions extracted from corpora. Another line of work has produced handcrafted rule-based systems to control specific stylistic dimensions, such as politeness and personality. This paper describes a novel approach that automatically learns to produce recognisable variation along a meaningful stylistic dimension-personality--without the computational cost incurred by overgeneration techniques. We present the first evaluation of a data-driven generation method that projects multiple personality traits simultaneously and on a continuous scale. We compare our performance to a rule-based generator in the same domain.
François Mairesse, Marilyn A. Walker