Sciweavers

LREC
2010
107views Education» more  LREC 2010»
14 years 28 days ago
Identifying Paraphrases between Technical and Lay Corpora
In previous work, we presented a preliminary study to identify paraphrases between technical and lay discourse types from medical corpora dedicated to the French language. In this...
Louise Deléger, Pierre Zweigenbaum
LREC
2010
175views Education» more  LREC 2010»
14 years 28 days ago
Capturing Coercions in Texts: a First Annotation Exercise
In this paper we report the first results of an annotation exercise of argument coercion phenomena performed on Italian texts. Our corpus consists of ca 4000 sentences from the PA...
Elisabetta Jezek, Valeria Quochi
LREC
2010
165views Education» more  LREC 2010»
14 years 28 days ago
Maximum Entropy Classifier Ensembling using Genetic Algorithm for NER in Bengali
In this paper, we propose classifier ensemble selection for Named Entity Recognition (NER) as a single objective optimization problem. Thereafter, we develop a method based on gen...
Asif Ekbal, Sriparna Saha
LREC
2010
168views Education» more  LREC 2010»
14 years 28 days ago
GRISP: A Massive Multilingual Terminological Database for Scientific and Technical Domains
The development of a multilingual terminology is a very long and costly process. We present the creation of a multilingual terminological database called GRISP covering multiple t...
Patrice Lopez, Laurent Romary
LREC
2010
195views Education» more  LREC 2010»
14 years 28 days ago
Adapting Chinese Word Segmentation for Machine Translation Based on Short Units
In Chinese texts, words composed of single or multiple characters are not separated by spaces, unlike most western languages. Therefore Chinese word segmentation is considered an ...
Yiou Wang, Kiyotaka Uchimoto, Jun'ichi Kazama, Can...
LREC
2010
132views Education» more  LREC 2010»
14 years 28 days ago
Technical Infrastructure at Linguistic Data Consortium: Software and Hardware Resources for Linguistic Data Creation
Linguistic Data Consortium (LDC) at the University of Pennsylvania has participated as a data provider in a variety of governmentsponsored programs that support development of Hum...
Kazuaki Maeda, Haejoong Lee, Stephen Grimes, Jonat...
LREC
2010
153views Education» more  LREC 2010»
14 years 28 days ago
Developing a Deep Linguistic Databank Supporting a Collection of Treebanks: the CINTIL DeepGramBank
Corpora of sentences annotated with grammatical information have been deployed by extending the basic lexical and morphological data with increasingly complex information, such as...
António Branco, Francisco Costa, Joã...
LREC
2010
144views Education» more  LREC 2010»
14 years 28 days ago
Community-based Construction of Draft and Final Translation Corpus Through a Translation Hosting Site Minna no Hon'yaku (MNH)
In this paper we report a way of constructing a translation corpus that contains not only source and target texts, but draft and final versions of target texts, through the transl...
Takeshi Abekawa, Masao Utiyama, Eiichiro Sumita, K...
LREC
2010
150views Education» more  LREC 2010»
14 years 28 days ago
Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese
Compilation of a 100 million words balanced corpus called the Balanced Corpus of Contemporary Written Japanese (or BCCWJ) is underway at the National Institute for Japanese Langua...
Kikuo Maekawa, Makoto Yamazaki, Takehiko Maruyama,...
LREC
2010
158views Education» more  LREC 2010»
14 years 28 days ago
Ways of Evaluation of the Annotators in Building the Prague Czech-English Dependency Treebank
In this paper, we present several ways to measure and evaluate the annotation and annotators, proposed and used during the building of the Czech part of the Prague Czech-English D...
Marie Mikulová, Jan Stepánek