AnCora: Multilevel Annotated Corpora for Catalan and Spanish

15 years 8 months ago

Download www.lrec-conf.org

This paper presents AnCora, a multilingual corpus annotated at different linguistic levels consisting of 500,000 words in Catalan (AnCora-Ca) and in Spanish (AnCora-Es). At present AnCora is the largest multilayer annotated corpus of these languages freely available from http://clic.ub.edu/ancora. The two corpora consist mainly of newspaper texts annotated at different levels of linguistic description: morphological (PoS and lemmas), syntactic (constituents and functions), and semantic (argument structures, thematic roles, semantic verb classes, named entities, and WordNet nominal senses). All resulting layers are independent of each other, thus making easier the data management. The annotation was performed manually, semiautomatically, or fully automatically, depending on the encoded linguistic information. The development of these basic resources constituted a primary objective, since there was a lack of such resources for these languages. A second goal was the definition of a consi...

Mariona Taulé, Maria Antònia Mart&ia

Real-time Traffic

Annotated | Education | LREC 2008 | Multilayer Annotated Corpus | Present Ancora |

claim paper

» Analysis of Joint Inference Strategies for the Semantic Role Labeling of Spanish and Catal...

» Broad Coverage Multilingual Deep Sentence Generation with a Stochastic MultiLevel Realizer

» Learning Morphology of Romance Germanic and Slavic Languages with the Tool Linguistica

» Wikicorpus A WordSense Disambiguated Multilingual Wikipedia Corpus

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Mariona Taulé, Maria Antònia Martí, Marta Recasens

Comments (0)

Sciweavers

AnCora: Multilevel Annotated Corpora for Catalan and Spanish

Annotated | Education | LREC 2008 | Multilayer Annotated Corpus | Present Ancora |

Explore & Download

Productivity Tools

Sciweavers