This paper presents a method for compiling a large-scale bilingual corpus from a database of movie subtitles. To create the corpus, we propose an algorithm based on Gale and Churc...
Terms, term relevances, and sentence relevances are concepts that figure in many NLP applications, such as Text Summarization. These concepts are implemented in various ways, thou...
We describe an automatic projection algorithm for transferring frame-semantic information from English to Italian texts, as a first sep towards the creation of Italian FrameNet. P...
In this paper we present JMWNL, a multilingual extension of the JWNL java library, which was originally developed for accessing Princeton WordNet dictionaries. JMWNL broadens the ...
Maria Teresa Pazienza, Armando Stellato, Alexandra...
The present paper deals with the design and the annotation of a Greek real-world emotional speech corpus. The speech data consist of recordings collected during the interaction of...
Modern statistical parsers are trained on large annotated corpora (treebanks). These treebanks usually consist of sentences addressing different subdomains (e.g. sports, politics,...
Morphologically rich languages pose a challenge to the annotators of treebanks with respect to the status of orthographic (spacedelimited) words in the syntactic parse trees. In s...
ibe various syntactic and semantic conditions for finding abstract nouns which refer to concepts of adjectives from a text, in an attempt to explore the creation of a thesaurus fr...
Kyoko Kanzaki, Francis Bond, Noriko Tomuro, Hitosh...
This paper describes a project aimed at converting a legacy representation of English idioms into an XML-based format. The project is set in the context of a large electronic Engl...
This paper describes the use of the CasSys platform in order to achieve the chunking of conversational speech transcripts by means of cascades of Unitex transducers. Our system is...