
113views Education» more  LREC 2008»
13 years 11 months ago
Reusable Tagset Conversion Using Tagset Drivers
Part-of-speech or morphological tags are important means of annotation in a vast number of corpora. However, different sets of tags are used in different corpora, even for the sam...
Daniel Zeman
157views Education» more  LREC 2008»
13 years 11 months ago
AnCora: Multilevel Annotated Corpora for Catalan and Spanish
This paper presents AnCora, a multilingual corpus annotated at different linguistic levels consisting of 500,000 words in Catalan (AnCora-Ca) and in Spanish (AnCora-Es). At presen...
Mariona Taulé, Maria Antònia Mart&ia...
131views Education» more  LREC 2008»
13 years 11 months ago
From D-Coi to SoNaR: a reference corpus for Dutch
The computational linguistics community in The Netherlands and Belgium has long recognized the dire need for a major reference corpus of written Dutch. In part to answer this need...
Nelleke Oostdijk, Martin Reynaert, Paola Monachesi...
110views Education» more  LREC 2008»
13 years 11 months ago
Spatiotemporal Coding in ANVIL
We present a new coding mechanism, spatiotemporal coding, that allows coders to annotate points and regions in the video frame by drawing directly on the screen. Coders can not on...
Michael Kipp
155views Education» more  LREC 2008»
13 years 11 months ago
OpenCCG Workbench and Visualization Tool
Combinatorial Category Grammar is (CCG) a lexicalized grammar formalism which is expressed by syntactic category, a logical form representation. There are difficulties in represen...
Thepchai Supnithi, Suchinder Singh, Taneth Ruangra...
93views Education» more  LREC 2008»
13 years 11 months ago
Towards a Glossary of Activities in the Ontology Engineering Field
The Semantic Web of the future will be characterized by using a very large number of ontologies embedded in ontology networks. It is important to provide strong methodological sup...
María del Carmen Suárez-Figueroa, As...
108views Education» more  LREC 2008»
13 years 11 months ago
Exploiting Lexical Resources for Disambiguating CJK and Arabic Orthographic Variants
The orthographical complexities of Chinese, Japanese, Korean (CJK) and Arabic pose a special challenge to developers of NLP applications. These difficulties are exacerbated by the...
Jack Halpern
91views Education» more  LREC 2008»
13 years 11 months ago
What is poorly Said is a Little Funny
We implement several different methods for generating jokes in English. The common theme is to intentionally produce poor utterances by breaking Grice's maxims of conversatio...
Jonas Sjöbergh, Kenji Araki
108views Education» more  LREC 2008»
13 years 11 months ago
Comparing Dependency and Constituent Syntax for Frame-semantic Analysis
We address the question of which syntactic representation is best suited for role-semantic analysis of English in the FrameNet paradigm. We compare systems based on dependencies a...
Richard Johansson, Pierre Nugues