We describe the induction of lexical resources from unannotated corpora that are aligned with treebank grammars, providing a systematic correspondence between features in the lexi...
A new approach to handle unknown words in machine translation is presented. The basic idea is to find definitions for the unknown words on the source language side and translate t...
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
We develop a method for detecting errors in semantic predicate-argument annotation, based on the variation n-gram error detection method. After establishing an appropriate data re...
Many NLP modules and applications require the availability of a module for wide-coverage inflectional analysis. One way to obtain such analyses is to use an morphological analyser...
In this paper, we reported experiments of unsupervised automatic acquisition of Italian and English verb subcategorization frames (SCFs) from general and domain corpora. The propo...
Alessandro Lenci, Barbara McGillivray, Simonetta M...
In this paper, we describe an approach that aims to model heterogeneous resources for information extraction. Document is modeled in graph representation that enables better under...
In this paper we present our recent work to develop phonemic and syllabic inventories for Castilian Spanish based on the C-ORAL-ROM corpus, a spontaneous spoken Spanish with varyi...
Antonio Moreno-Sandoval, Doroteo Torre Toledano, R...
In this paper, we present a simple yet efficient automatic system to translate biomedical terms. It mainly relies on a machine learning approach able to infer rewriting rules from...