Sciweavers

LREC
2008

Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research

14 years 1 months ago
Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research
This paper describes a project aimed at converting a legacy representation of English idioms into an XML-based format. The project is set in the context of a large electronic English-Polish dictionary which contains several hundred formalized idiom descriptions and which has been released under the terms of a free license. In short, the project consists of three phases: cleaning up the dictionary markup, extracting the legacy idiom representations, and converting them into TEI P5 XML constrained by a RelaxNG grammar created for this purpose and constituting a module that can be included as part of the TEI P5 schema. The paper contains general descriptions of the individual phases and several examples of XML-encoded idioms. It also suggests some directions for further research, which include ing the XML-ized idiom representations into general syntactic patterns and using the representations to automatically identify idioms in tagged corpora.
Piotr Banski, Radoslaw Moszczynski
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Piotr Banski, Radoslaw Moszczynski
Comments (0)