Construction of a Benchmark Data Set for Cross-lingual Word Sense Disambiguation

14 years 1 months ago

Download www.lrec-conf.org

Given the recent trend to evaluate the performance of word sense disambiguation systems in a more application-oriented set-up, we report on the construction of a multilingual benchmark data set for cross-lingual word sense disambiguation. The data set was created for a lexical sample of 25 English nouns, for which translations were retrieved in 5 languages, namely Dutch, German, French, Italian and Spanish. The corpus underlying the sense inventory was the parallel data set Europarl. The gold standard sense inventory was based on the automatic word alignments of the parallel corpus, which were manually verified. The resulting word alignments were used to perform a manual clustering of the translations over all languages in the parallel corpus. The inventory then served as input for the annotators of the sentences, who were asked to provide a maximum of three contextually relevant translations per language for a given focus word. The data set was released in the framework of the SemEva...

Els Lefever, Véronique Hoste

Real-time Traffic

Data Set | Education | LREC 2010 | Sense Inventory | Word Sense Disambiguation |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Els Lefever, Véronique Hoste

Comments (0)

Sciweavers

Construction of a Benchmark Data Set for Cross-lingual Word Sense Disambiguation

Data Set | Education | LREC 2010 | Sense Inventory | Word Sense Disambiguation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers