UTDallas at TREC 2008 Blog Track

15 years 8 months ago

Download trec.nist.gov

This paper describes our participation in the 2008 TREC Blog track. Our system consists of 3 components: data preprocessing, topic retrieval, and opinion finding. In the topic retrieval task, we applied Lemur IR toolkit and used various techniques for query expansion. In the opinion finding and polarization task, we employed a feature-based classification approach. Then re-ranking was performed using a linear combination of the opinionated score and the topic relevance score. Our system achieved reasonable performance in this evaluation. 1 System Overview We participated in several tasks of the 2008 TREC Blog Track. Figure 1 shows the flow diagram of our system. First, data preprocessing is implemented to remove HTML tags and useless context, and extract content from the blog web pages. Second, we apply the Lemur Information Retrieval toolkit to retrieve 1000 relevant documents for each topic. Query terms are selected from the title and descriptions, and weighted according to their TF...

Bin Li, Feifan Liu, Yang Liu

Real-time Traffic