Methods for the Extraction of Hungarian Multi-Word Lexemes

15 years 8 months ago

Download odur.let.rug.nl

This paper describes an experiment on extracting Hungarian multi-word lexemes from a corpus, using statistical methods. Corpus preparation—the addition of POS tags and stems—was done automatically. From the corpus, verb+noun+casemark patterns were extracted as collocation candidates. Evaluation shows that the statistical methods used by Villada Moir´on (2004a) to identify Dutch V + PP collocations, can also be applied to the Hungarian data. Some collocation types (such as verbal arguments) require special extraction methods, as explained in the evaluation section. Finally, we suggest that the extraction process can be further improved by a blend of statistical techniques with rule-based and dictionary-based methods.

Balázs Kis, Begoña Villada, Gosse Bo

Real-time Traffic

CLIN 2003 | Computational Linguistics | Corpus Preparation—the Addition | Hungarian Multi-word Lexemes | Statistical Methods |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	CLIN
Authors	Balázs Kis, Begoña Villada, Gosse Bouma, Gábor Ugray, Tamás Bíró, Gábor Pohl, John Nerbonne

Comments (0)

Sciweavers

Methods for the Extraction of Hungarian Multi-Word Lexemes

CLIN 2003 | Computational Linguistics | Corpus Preparation—the Addition | Hungarian Multi-word Lexemes | Statistical Methods |

Explore & Download

Productivity Tools

Sciweavers