Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections

16 years 3 months ago

Download www.dc.fi.udc.es

The traditional retrieval models based on term matching are not eﬀective in collections of degraded documents (output of OCR or ASR systems for instance). This paper presents a n-gram based distributed model for retrieval on degraded text large collections. Evaluation was carried out with both the TREC Confusion Track and Legal Track collections showing that the presented approach outperforms in terms of eﬀectiveness the classical term centred approach and the most of the participant systems in the TREC Confusion Track.

Javier Parapar, Ana Freire, Alvaro Barreiro

Real-time Traffic

Computer Science | ECIR 2009 | Legal Track Collections | Term Centred Approach | TREC Confusion Track |

claim paper

» Performance Evaluation of a Distributed Architecture for Information Retrieval

Post Info
More Details (n/a)

Added	08 Mar 2010
Updated	08 Mar 2010
Type	Conference
Year	2009
Where	ECIR
Authors	Javier Parapar, Ana Freire, Alvaro Barreiro

Comments (0)

Sciweavers

Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections

Computer Science | ECIR 2009 | Legal Track Collections | Term Centred Approach | TREC Confusion Track |

Explore & Download

Productivity Tools

Sciweavers