Sciweavers

LREC
2008

Czech MWE Database

14 years 1 months ago
Czech MWE Database
In this paper we deal with a recently developed large Czech MWE database containing at the moment 160 000 MWEs (treated as lexical units). It was compiled from various resources such as encyclopedias and dictionaries, public databases of proper names and toponyms, collocations obtained from Czech WordNet, lists of botanical and zoological terms and others. We describe the structure of the database and give basic types of MWEs according to domains they belong to. We compare the built MWEs database with the corpus data from Czech National Corpus (approx. 100 mil. tokens) and present results of this comparison in the paper. These MWEs have not been obtained from the corpus since their frequencies in it are rather low. To obtain a more complete list of MWEs we propose and use a technique exploiting the Word Sketch Engine, which allows us to work with statistical parameters such as frequency of MWEs and their components as well as with the salience for the whole MWEs. We also discuss explo...
Karel Pala, Lukás Svoboda, Pavel Smerk
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Karel Pala, Lukás Svoboda, Pavel Smerk
Comments (0)