User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
With an increasing number of people that read, write and comment on blogs, the blogosphere has established itself as an essential medium of communication. A fundamental characteri...
: This paper presents the results of our team in the Genomics and Blog tracks in TREC 2007. We used the language model implementation provided by Indri for both tracks. For the BLO...
Miguel E. Ruiz, Ying Sun, Jianqiang Wang, Hongfang...