In Sequential Viterbi Models, such as HMMs, MEMMs, and Linear Chain CRFs, the type of patterns over output sequences that can be learned by the model depend directly on the model’s structure: any pattern that spans more output tags than are covered by the models’ order will be very difficult to learn. However, increasing a model’s order can lead to an increase in the number of model parameters, making the model more susceptible to sparse data problems. This paper shows how the notion of output transformation can be used to explore a variety of alternative model structures. Using output transformations, we can selectively increase the amount of contextual information available for some conditions, but not for others, thus allowing us to capture longer-distance consistencies while avoiding unnecessary increases to the model’s parameter space. The appropriate output transformation for a given task can be selected by applying a hill-climbing approach to heldout data. On the NP Ch...