Abstract. We introduce both neural higher-order linear-chain conditional random fields (NHO-LC-CRFs) and a new structured regularizer for these sequence models. We show that this regularizer can be derived as lower bound from a mixture of models sharing parts of each other, e.g. neural sub-networks, and relate it to ensemble learning. Furthermore, it can be expressed explicitly as regularization term in the training objective. We exemplify its effectiveness by exploring the introduced NHOLC-CRFs for sequence labeling. Higher-order LC-CRFs with linear factors are well-established for that task, but they lack the ability to model non-linear dependencies. These non-linear dependencies, however, can be efficiently modeled by neural higher-order input-dependent factors. One novelty in this work is to map sub-sequences of inputs to sub-sequences of outputs using distinct multilayer perceptron sub-networks. This mapping is important in many tasks, in particular, for phoneme classification ...