We show how to improve a state-of-the-art neural network language model that converts the previous "context" words into feature vectors and combines these feature vectors to predict the feature vector of the next word. Significant improvements in predictive accuracy are achieved by using higher-level features to modulate the effects of the context words. This is more effective than using the higher-level features to directly predict the feature vector of the next word, but it is also possible to combine both methods.
Zhang Yuecheng, Andriy Mnih, Geoffrey E. Hinton