

Blogger, stick to your story: modeling topical noise in blogs with coherence measures

14 years 2 months ago
Blogger, stick to your story: modeling topical noise in blogs with coherence measures
Topical noise in blogs arises when bloggers digress from the central topical thrust of their blogs. We introduce a method to explicitly incorporate a model of topical noise into a language modeling approach to the task of blog distillation. Topical noise is integrated into the model using a coherence score, which reflects the tightness of the topical structure of a blog. Tests performed on the TRECBlog06 corpus show that a naive integration of the coherence score as blog prior fails to achieve performance improvements. Instead, we develop a set of more sophisticated models in which the coherence score is weighted by a function of the blog retrieval score. The proposed models help improve effectiveness of our language modeling approach to the blog distillation task. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.3 Information Search and Retrieval; H.3.4 Systems and Software; H.4 [Information Systems Applications]: H.4.2 Types of Systems; H.4.m Miscellan...
Jiyin He, Wouter Weerkamp, Martha Larson, Maarten
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Authors Jiyin He, Wouter Weerkamp, Martha Larson, Maarten de Rijke
Comments (0)