The detection of new information in a document stream is an important component of many potential applications. In this work, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, the information- pattern concept for novelty detection is presented with the emphasis on new information patterns for general topics (queries) that cannot be simply turned into specific questions whose answers are specific named entities (NEs). Then we elaborate a thorough analysis of sentence level information patterns on data from the TREC novelty tracks, including sentence lengths, named entities, sentence level opinion patterns. This analysis provides guidelines in applying those patterns in novelty detection particularly for the general topics. Finally, a unified pattern-based approach is presented to novelty detection for both general and specific topics. The new method for dealing with general topics will be the focus. Experimental resu...
Xiaoyan Li, W. Bruce Croft