Czech MWE Database

15 years 9 months ago

Download www.lrec-conf.org

In this paper we deal with a recently developed large Czech MWE database containing at the moment 160 000 MWEs (treated as lexical units). It was compiled from various resources such as encyclopedias and dictionaries, public databases of proper names and toponyms, collocations obtained from Czech WordNet, lists of botanical and zoological terms and others. We describe the structure of the database and give basic types of MWEs according to domains they belong to. We compare the built MWEs database with the corpus data from Czech National Corpus (approx. 100 mil. tokens) and present results of this comparison in the paper. These MWEs have not been obtained from the corpus since their frequencies in it are rather low. To obtain a more complete list of MWEs we propose and use a technique exploiting the Word Sketch Engine, which allows us to work with statistical parameters such as frequency of MWEs and their components as well as with the salience for the whole MWEs. We also discuss explo...

Karel Pala, Lukás Svoboda, Pavel Smerk

Real-time Traffic

Czech MWE Database | Database | Education | Lexical Units | LREC 2008 |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Karel Pala, Lukás Svoboda, Pavel Smerk

Comments (0)

Sciweavers

Czech MWE Database

Czech MWE Database | Database | Education | Lexical Units | LREC 2008 |

Explore & Download

Productivity Tools

Sciweavers