Vocabulary Reduction and Text Enrichment at WebCLEF

14 years 7 months ago

Download users.dsic.upv.es

Nowadays, cross-lingual Information Retrieval (IR) is one of the greatest challenges to deal with. Besides, one of the most important issues in IR consists in the corpus vocabulary reduction in order to make possible to use in real situations some methods of IR such as the well-known vector space model. In this work, we have considered a vocabulary reduction process based on the selection of mid-frequency terms. Our approach enhances precision, but in order to obtain a better recall, we have conducted an enrichment process based on the addition of co-ocurrence terms. By using this approach, we have obtained an improvement of 40% in the corpus of the BiEnEs WebCLEF 2005 task. The obtained results in the current mixed monolingual task of the WebCLEF 2006 have shown that the text enrichment must be done before the vocabulary reduction process in order to get the best performance. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexi...

Franco Rojas López, Héctor Jim&eacut

Real-time Traffic

CLEF 2006 | Corpus Vocabulary Reduction | Information Management | Vocabulary Reduction | Vocabulary Reduction Process |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	CLEF
Authors	Franco Rojas López, Héctor Jiménez-Salazar, David Pinto

Comments (0)

Sciweavers

Vocabulary Reduction and Text Enrichment at WebCLEF

CLEF 2006 | Corpus Vocabulary Reduction | Information Management | Vocabulary Reduction | Vocabulary Reduction Process |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers