Sciweavers

ECIR
2009
Springer

Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections

14 years 9 months ago
Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections
The traditional retrieval models based on term matching are not effective in collections of degraded documents (output of OCR or ASR systems for instance). This paper presents a n-gram based distributed model for retrieval on degraded text large collections. Evaluation was carried out with both the TREC Confusion Track and Legal Track collections showing that the presented approach outperforms in terms of effectiveness the classical term centred approach and the most of the participant systems in the TREC Confusion Track.
Javier Parapar, Ana Freire, Alvaro Barreiro
Added 08 Mar 2010
Updated 08 Mar 2010
Type Conference
Year 2009
Where ECIR
Authors Javier Parapar, Ana Freire, Alvaro Barreiro
Comments (0)