This paper presents work done at Cambridge University for the TREC-9 Spoken Document Retrieval (SDR) track. The CUHTK transcriptions from TREC-8 with Word Error Rate (WER) of 20.5% were used in conjunction with stopping, Porter stemming, Okapi-style weighting and query expansion using a contemporaneous corpus of newswire. A windowing/recombination strategy was applied for the case where story boundaries were unknown (SU) obtaining a final result of 38.8% and 43.0% Average Precision for the TREC-9 short and terse queries respectively. The corresponding results for the story boundaries known
Sue E. Johnson, P. Jourlin, Karen Sparck Jones, Ph